GROWTH IN GROUPS: IDEAS AND PERSPECTIVES 

H. A. HELFGOTT 

Abstract. This is a survey of methods developed in the last few years to prove 
results on growth in non-commutative groups. These techniques have their roots 
in both additive combinatorics and group theory, as well as other fields. We 
discuss linear algebraic groups, with SL2(Z/pZ) as the basic example, as well as 
permutation groups. The emphasis will lie on the ideas behind the methods. 

In memory of Akos Seress (1958-2013) 
1. Introduction 

1.1. Main questions. Previous situation. Let ^ be a finite subset of a group G. 
What can one say about the siz^ oi A - A = {xy : x,y £ A}? What about the size 
of a'' = {xi • • • Xfc : G A}7 What can one say about the distribution of xi • • • x^, 
when xi, . . . ,Xk are taken at random within A? How large must k be before one can 
express every element of G in the form xi - ■ ■ x^, xi G Al 

All of these are questions on growth in groups. Until rather recently, such questions 
were treated within separate areas in mathematics, with disparate sets of tools: 

(a) Additive combinatorics. This is a relatively recent name for a field one of 
whose starting points is the work of Freiman ( |Fre73] : see also |Ruz91j ) clas- 
sifying subsets A of Tj such that ^ + ^ is not much larger than A. Work by 
Ruzsa and others f |Plii70j . |Ruz89j . [RTSSj ) showed how the size of A + A 
relates to the size of A + A + A, A + A + A + A, and so on. In general, additive 
combinatorics treated abelian groups, even if some of its techniques turned 
out to generalize to non-abelian groups rather naturally (see, e.g., [TaoOS] ) . 

(b) Mixing times and diameters. Let ^ be a set of generators of a finite group G. 
The mixing time is the least k such that, when taken uniformly 
and at random within A, the distribution of the product xi • • • is close to 
uniform in G. (We speak of £2 mixing time, ioo mixing time, etc., depending 
on the norm used to define "close to".) Here most work has focused on 
permutation groups, with a strong probabilistic flavor: see |BBS04| . |BH05j . 
|BH92j . [DS8T] . |DSC93] as well as |LPW09] and references therein. 

There is also the related question of the diameter, defined to be the least 
k such that every element g of G can be written as g = X1X2 . . .Xr for some 
Xi £ A, r < k. Babai's conjecture |BS881 p. 176] posits that, if G is simple 
and non-abelian, the diameter is always small, that is to say, (log IGI)*^^^^. 

2010 Mathematics Subject Classification. Primary: 20F69; Secondary: 20D60, 11B30, 20B05. 
^By the size or cardinality of a finite set S we mean simply its number of elements. We denote 
the number of elements of S by 
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For the alternating group G = Alt(n), this was a folklore conjecture; work 
towards it includes [BS88] . |BKL89j and |BBSn4j . 
(c) Expanders, spectral gaps and property (T). For A a set of generators of a finite 
group G, we say that the pair (G, A) gives us an e- expander if every subset S 
of G with 15*1 < \G\/2 satisfies \S U AS\ > (1 + e)|5|. An e-expander always 
has very small diameter and, if the identity e is in A, also has very small £2 
and loo mixing time. (Here "very small" means "logarithmic in 

(Alternatively, we can define the expander property in terms of the size of 
the second largest eigenvalue Ai of the adjacency matri^^ £/ (or the smallest 
non-zero eigenvalue of the discrete Laplacian A = of the Cayley graph 

T{G,A), which is the graph having G as its set of vertices and {{g,ag) : g G 
G,a E A} as its set of edges; £/ is defined as in ()4.9p . We say that T{G, A) 
is an e-expander if Ai > e. This is equivalent to the above for |^| bounded 
(though e is replaced by in one of the two directions of the equivalence).) 

It had long been knowE@ that the fact that the Laplacian on the surfac^ 
r(A'")\BI has a spectral gap ( |Sel65| . a key result in the theory of modular 
forms) implies that the pairs (SL2(Z/pZ), A) with 



(1.1) A 



1 IWI 
1 / ' I 1 1 



are a family of expanders, i.e., are e-expanders for some fixed e > 0. Before 
[HelOSj and [BGOScJ, little was known for more general A; e.g., for 



(1.2) A 



1 3 \ / 1 
1 / ' I 3 1 



there were no good diameter bounds, let alone a proof that (SL2(Z/pZ), ^) 
is a family of expanders. (This is a favorite example of Lubotzky's.) 

For G = SL„(Z/pZ), n > 3, the proof of expansion for some A was ar- 
guably more direct (due to property (T), for which relatively elementary 
proofs were known |Kaz67] ) . but the case of general A was open, just as for 
n = 2. 

Kassabov applied what was known for SL„ (and linear algebraic groups in 
general) to prove the existence of expanders for the symmetric group [Kas07j . 



*^We use the normalized adjacency matrix, defined to be the operator that maps a function / on 
the set of vertices 1^ of a graph to the function s^f on V whose value at v is the average of f{w) 
on the neighbors w of v. For a Cayley graph, this is the same as (|4.9|) . 

■^Tracing the statement is non-trivial. The correspondence was shown for compact quotients 
in Buser's work [Bus78| . See also Brooks | Bro86] (still for the compact case) and |Bro87| ( non- 
compact case, in terms of the Kazhdan constant) and Burger [Bur86) (compact case, in terms of 
eigenvalues). What is a little harder to pinpoint is the first proof for the non-compact case in terms 
of eigenvalues. (At least some proofs for the compact case do generalize to the non-compact case - 
see, e.g., [EHK12I App. A], based on Burger's approach - but this seems not to have been obvious at 
first.) A. Lubotzky and P. Sarnak (private communication) state that the work leading to [LPS88] 
was originally centered on SL(Z/pZ) and r(A'^)\]HI and, in particular, showed the correspondence in 
this (non-compact) case. Thanks are due to them and to E. Kowalski for several references. 

''Meaning the quotient of the upper half plane H by the action of r(A'^) = {g £ SL2(Z) : g = 
I modA'^}, where r(A'^) < SL2(Z) acts on H by fractional linear transformations. 
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Other relevant works are |SX91] (giving an elementary treatment of expan- 
sion for A as in (jl.ip . and, in general, "arithmetic lattices'ilin SL2), |Sha97j . 
|Sha99| . |Gam02] (strengthening and generalizing |SX91j to some infinite- 
index groups), and |GS04| and |Din06j (both of them influenced by the 
Solovay-Kitaev algorithm, as in jNCOOl App. 3]; see jVarj for recent work 
on this line). 

(d) Group theory: subgroup classification. If e E ^, the extreme case |^-^| = |^| 
happens exactly when ^4 is a subgroup of G. There are results on subgroup 
classification from the 80s and 90s intended to bypass parts of the Classifi- 
cation of Finite Simple Groups by elementary arguments. Several results of 
these kind [LPllj^ . |Bab82j . |Pyl393| later played an important role in the 
study of growth results: their techniques for studying sets with |j4 • A| = \A\ 
(that is, subgroups) turned out to be robust enough to extend to the study 
of sets for which |^ • ^| is not much larger than |^|. 

(e) Asymptotic group theory. Model theory. If G is infinite, it makes sense to ask 
how 1^*^! grows as A; —7- 00. One of the main results here is Gromov's theo- 
rem |Gro81| . The work of Hrushovski and his collaborators, culminating in 
|Hrul2j (see also |HP95j and |HW08 ] ) , used model theory to study subgroups 
of algebraic groups, recovering and extending Larsen-Pink's estimates |LP11] 
(among other results), and, in due course, giving a new proof of Gromov's 
theorem |Hrul2] . 

The overall landscape shifted due to a quick succession of developments starting in 
2005 with the prepublication of ^HelOS] , followed quickly by [BGOBcj and a series of 
papers by many authors. Ideas from all of the above fields are interacting in many 
ways, yielding results far stronger and more general than many of those known 
before. This is the topic of this survey. 

Our focus will lie on the ideas behind the main growth results, going from 
SL2(Z/pZ) r |Hel08) . reexamined in the light of |Helll| . |BGTllj . (PSa) ) up to new 
work on the symmetric group jHSj. We will spend less time on the applications of 
these results to expander graphs, as that has been nicely covered elsewhere |Kowc| . 
|Lubl2j . (See also the notes by Kowalski [Kowaj and Tao [Tao] .) 

Several main themes run through proofs that seem very different on the technical 
surface. One of them is the idea of stable configurations under group actions as the 
main object of study. It is this, with slowly growing sets (approximate subgroups) as 
a special case, that encapsulates not just the results but a great deal of the approach 
to proving them. 

1.2. Main results covered. 

1.2.1. Growth in linear groups. One of our goals will be to show the main ideas 
behind proofs of growth in linear groups of bounded rank. In particular, we will 
give most of the details of what amounts to an "up-to-date" proof of the following 



^For more on the case of arithmetic lattices (a case that covers (ll.lf) but not (ll.2|l '). see the 
references in [GV12I §1.1] 

^Circulated in preprint form since ca. 1998. 
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result, in such a way that the proof generahzes naturally. (In other words, it will 
incorporate ideas from the series of developments to which the first proof gave rise.) 

Theorem 1.1 (Helfgott |Hein8] ). Let G = SL2(Fp). Let A ^ G generate G. Then 
either 

\A^\ > \A\^+^ 

or [A U A^^ U {e})^ = G, where (5 > and k > 1 are absolute constants. 

Here, as usual, "absolute constant" means "really a constant"; in particular, the 
constants do not depend on p. Nikolov-Pyber [NPTl] (following Gowers |Gow08| : 
see also [BNP08| ) showed one can replace {AuA~^ U {e})'' by A^. Kowalski [Kowbj 
has shown one can take 5 = 1/3024 (assuming A = A~^, e £ A; we will see in ^2.11 
that these are very "light" assumptions). 

There are two kinds of generalizations. 

(a) Changing the field. Dinai [Dinll] and Varjii [Varl2j showed Fp can be 
changed to F^. The proof in [HelOSj easily gives that Thm. fLT] still holds if Fp 
is changed to C and A is taken to be finiteQ However, for applications, one 
often needs a stronger generalization, where the measure of a general set A 
(and of A^) is considered. This was done in [BGOSb] for SU(2), which has the 
same Lie algebra type as SL2 : the main idea is to redo the proof in [Hel08| , 
still for finite sets, but keeping track of distances (e.g., where |Hel08j uses 
that a map is injective, [BGOS b] also checks that the map does not shrink 
distances by more than a constant factor). 

(b) Changing the Lie type. The generalization of Thm. [TT] to SL3(Fp) [Helll] 
was neither easy nor limited to SL3 alone; it involved general work on tori, 
conjugacy classes and slowly growing sets ( [Helll[ §5] does this for SL„) as 
well as a new level of abstraction, taking ideas from sum-product theorems 
(pivoting; see Q to the context of actions of groups on groups. Much of 
the rest of the generalization to SL„ was carried out in [ GHlTj . but, for 
instance, S0„ (n large) resisted (and was an obstruction to a full solution 
for SL„). Full and elegant generalizations to all finite simple groups of Lie 
type (with bounds depending on the rank) were given by Pyber and Szabo 
|PSa| and, independently, by Breuillard, Green and Tao [BGTllj : this, of 
course, covered the classical groups SL„(Fq), SO„(Fq), Sp2„(Fq), with 6 > 
depending on n. (The issue of the dependence on n is important; we will 
discuss it in some detail later.) 

'''in fact, in that case, there is a predecessor: Elekes and Kiraly proved [EKOlj a result correspond- 
ing to Thm. 11.11 with R instead of Fp, and unspecified growth bounds. In general, in arithmetic 
combinatorics, results over R or C are more accessible than results over Fpi R has an ordering 
and a topology that a general field, or Fp in particular, does not have. See the discussion on the 
sum-product theorem at the beginning of i]2.3l for a relevant instance of this. Part of the merit of 
[BGOSb) is precisely that it proves a result over C that is of the same order of difficulty as Thm. 11.11 
(Note that |Cha08| gives (a) a simplified proof over C (based on [HelOSj ) and (b) an early attack 
on SL3.) 

Equally important is the fact that the new growth bounds on non-commutative groups are 
quantitatively strong (|^A^| > See the remarks after Theorem 12.61 
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The case in some sense opposite to that of simple groups is that of solvable 
groups. We will go over the case of a small but paradigmatic solvable group in 
detail - the affine group of i^M- The general case of solvable subgroups 
of SL„(Fp) is treated in [GHj • A clean generalization of |GH] to ¥q still 
remains to be done. 

It follows easily from Thm. fTTTj that the diameter of Gp = SL2(Z/pZ) with respect 
to any set of generators A is (log |Gp|)'^'^^^: applying Thm. times, we obtain 

that, if A^^ / Gp, then 

\A^'\>\A\^^+'y, 

and so ^ < (log((log |Gp|)/(log |^|)))/ log(l + 5), implying that the diameter of Gp 
with respect to A is 

«((log|G|)/(log|^|))«(i/'^). 
This proved Babai's conjecture for SL2(Z/pZ). (This was the first time Babai's con- 
jecture was confirmed for an infinite class of groups and arbitrary sets of generators.) 

If A is a subset of SL2(Z), and its projections Ap = A modp generate Gp, then we 
can do better. For example, if A is as in (|1.2p . then A generates a free group, i.e., any 
two products oi • • • a^, a'^ - ■ ■ a'j^, of elements of ^4 U A~^ are distinct, unless they are 
equal for the trivial reason of having the same reduction (e.g., xx~^yz = yw~^wz, 
since both reduce to yz purely formally). If k and k' are < logc(p/2) (where c = 
2-3 = 6 for A is in (jl.2p . then ai ■ ■ ■ ai^ a[ ■ ■ ■ a'f,, implies ai ■ ■ ■ ^ a'l ■ ■ ■ a'j^, modp, 
simply because the entries of the matrices involved are < p/2 in absolute value, 
and thus cannot be congruent without being equalH Hence, for k = [log(,p/2j, 
I {Ap U Ap'^ U e)''| is already quite large (> {2\A\ - I)'' > 5c > 0)0 We can then 
apply Theorem 11.11 a bounded number of times, and conclude that the diameter of 
Gp with respect to Ap is in fact <C logp <C log i.e., logarithmic. 

The same argument works, in general, when the subgroup {A) is Zariski-dense, 
i.e., is not contained in a proper subvariety of SL2(Z). (This neat condition ensures 
that (a) A modp generates Gp = SL2(Z/pZ) for p large enough ( [HP95j . |Nor87j . 
|MVW84] ■ [WfiiMj) and (b) (by the Tits alternative) A contains elements generating 
a free group.) The argument above then shows that Theorem 11.11 implies that the 
diameter of SL2(Z/pZ) with respect to A modp is logarithmic. 

Bourgain and Gamburd proved a rather stronger statement. 

Here we are using the stronger version of Thm. [LI] with instead of {A U A'^ U {e})*. If 
we wanted to use Thm. 11.11 in its original version (i.e., as we stated it), then (as in |Hel08| ) we 
could use [BabOGI Thm. 1.4], which shows that the diameter of Gp with respect to A is at most 
^■^(log I Gp I where d is the diameter of Gp with respect to A U A~^ U {e}. Here ct^ equals the 
diameter of the (directed) Cayley graph T{Gp,A) defined before, whereas d equals the diameter of 
the undirected Cayley graph - which is just the same graph, but with arrows deleted. 

^This argument, common in Diophantine analysis, appears in this context already in [Mar82| . 
as was noted in |BG08c| . 

^'^This bound on growth in the free group is trivial: given a word ending in, say, x, we can 
choose to prolong it by any element of A U A~^ other than . Note, however, that obtaining 
a result like Theorem 11.11 for the free group is far from trivial; Theorem 11.11 (and |Cha08| ) imply 
such a result, but the first direct proof is due to Razborov [Raz) . who proved a strong bound 
> |yl|'^/(log lAj)*^'^' for any finite subset A of a free group on at least two elements. 
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Theorem 1.2 (Bourgain-Gamburd). Let A C SL2(Z) generate a Zariski- dense sub- 
group o/SL2(Z). Then (SL2(Z/pZ), A modp) are a family of expanders. 

In general, expansion is stronger than logarithmic mixing time, which is stronger 
than logarithmic diameter. Bourgain and Gamburd first show that a kind of mixing 
time (in a very weak sense: an I such that l/i^^-*]! <C where //^^^ is the distri- 

bution after £ steps of a random walk) is indeed logarithmic: for the first log^(p/2) 
steps, a random walk mixes well (for the same reason as above); then, for a constant 
number of steps, they apply a result on the I2 "flattening" of measures under convo- 
lutions (Prop. 12. 5p that they prove using Thm. 11.11 (via a non-commutative version 
[TaoOSj of a result from additive combinatorics, the Balog-Szemeredi-Gowers the- 
orem). The fact that expansion does follow from a logarithmic bound on a weak 
^2-niixing time for groups such as SL2(Z/pZ) is due to Sarnak-Xue |SX91j . We will 
go over this in detail in ^4.5i 

Theorem ll.2l has found manifold applications (see, e.g., |BGS10| (the affine sieve)); 
we refer again to |Kowc] and |Lubl2] . Before, such results as we had on expansion 
in SL2(Z/pZ) were deduced from results on the spectral gap of the (continuous) 
Laplacian on the surface r(p)\EI. Thm. [L2] is a much more general result, based on 
a combinatorial result, namely, Thm. [TTl Notably, Bourgain, Gamburd and Sarnak 
[BGSll] then reversed the original implication, showing that Thm. [L2] can be used 
to obtain spectral gaps for the Laplacian on general quotients A\]H (A < SL2(Z) 
Zariski-dense). 

1.2.2. The symmetric group and beyond. The Classification of Finite Simple Groups 
tells us that every finite, simple, non-abelian group is either a matrix group, or the 
alternating group Alt(n), or one of a finite list of exceptions. (The list is irrelevant 
for asymptotic statements, precisely because it is finite.) All of the above work 
on matrix groups leaves unanswered the corresponding questions on diameter and 
growth in Alt(n) and other permutation groups. 

The question of the diameter of permutation groups can be stated precisely in a 
playful way. Let a set A of ways to scramble a finite set be given. This is the 
familiar setting of permutation puzzles: Rubik's cube, Alexander's star, Hungarian 
rings. . . . People say that a position has a solution if it can be unscrambled back to 
a fixed 'starting position' by means of some succession of moves in A. Given that 
we are told that a position has a solution, does it follow that it has a short solution? 

The answer is yes |HS| . The only condition is that {A) be transitive, i.e., that, 
given two elements x, y of VL, there be a succession of moves in A that, when 
combined, take x to y. (Transitivity is necessary: it is easy to construct a non- 
transiti ve g roup of very large diameter |BS92l Example 1.2]. However, if the number 
of orbitqlj is bounded, then the problem reduces to the transitive case.) 

It is easy to see that a request for short solutions is the same as one for a small 
diameter: ii A = A^^ , the diameter diam(r((^), A)) equals the maximum, over all 
positions, of the length of the shortest solution to that position. 

^^An orhit, in a permutation group G < Sym(n), means an orbit of {1, 2, . . . , n} under the action 
of G. Thus, Rubik's cube has three orbits: corners, sides and centers (if you are allowed to rotate 
the cube in space). 
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As |BS92j showed, questions on the diameter of permutation groups reduce (with 
some loss) to the case G = Alt(n). Babai's conjecture |BS88t p. 176] gives that, for 
any A C Alt(n) generating Alt(n), 

diamr(Alt(n),A) < (log|G|)^(^) ^n^^^\ 

This special case of the conjecture actually predates Babai; |BS921 p. 232] calls 
it "folkloric". There are earlier references in print |KMS84j , |McK84| where the 
question is posed as to the exact conditions under which a bound of rp^^^ might be 
valid; in particular ( |KMS84t §4]), might transitivity be sufficient? A weaker bound 
of 

(1.3) diam(G) < exp((logn)^(i)) 

was conjectured for all transitive subgroups of Sym(n) by Babai and Seress |BS92[ 
Conj. 1.6]. 

Theorem 1.3 (Helfgott-Seress |HSj ). Let G = Sym(n) or Alt(n). Let A he any set 

of generators of G. Then 

diam(r(G,^)) < exp(0((log n)^ log log n)), 
where the implied constant is absolute. 

By |BS92j . this implies that (II. 3p holds for all transitive permutation groups on 
n elements. 

What is the importance of the symmetric case, from the perspective of linear 
groups? It is not just a matter of historical importance (in that conjectures for 
permutation groups preceded Babai's more general conjecture) or of generality. The 
groups Sym(n) and Alt(n) are, in a sense, creatures of pure rank; Alt(n) corresponds 
particularly closely to what SL„ over a field with one element would be likeo Uni- 
formity on the rank is precisely what is still missing in the linear algebraic case; 
the new result on Sym(n) and Alt(n) can be seen as breaking the barrier of rank 
dependence, just as [Hel08] showed that independence on the field was a feasible 
goal. 

* * * 

Before |HSj . the strongest bound on the diameter of permutation groups was 
that of Babai-Seress |BS88j . who showed that, for any permutation group G on n 
elements, and any A d G generating G, 

diam(r(G, A)) < exp((l + o(l))ynlog^). 

While this is much weaker than p.3|) . it does not assume transitivity (and indeed 
it can be tight for non-transitive groups). Moreover, the proof (see also |BLS87| ) 
contains an idea that was useful in |Hg^ §3.6]. 

There is also, notably, |BBS04] . which proved the polynomial bound 

diam(r(G,^)) <e rP^^^ 



While the field with one element does not exist, objects over the field with one element can be 
defined and studied. This is an idea going back to Tits [Tit57] : see, e.g., [Los]. 



8 



H. A. HELFGOTT 



for G = Sym(n) or G = Alt(n) provided that A contains at least one element g ^ e 
such that the support 

supp(5) = {x G {1, 2, . . . , n} : 5X / x} 

has no more than n/(3 + e) elements, e > 0. There was also an older result [McK84) 
(see also [DF87| ) proving a polynomial bound when the support of every element of 
A is of bounded size. The condition in |BBS04) was relaxed to | supp(g)| > 0.63n in 
|BGH+| (previous to [HS]). 

Quite besides the fact that the main result in |BBS04| gets used in |HS| . an 
important idea in the argument in t BBS04j plays a key role in [HS] , The proof in 
[HSj uses a short random walk on V[G,A) to obtain an almost uniform distribution 
on {1, 2, . . . , n}, or A:-tuples of elements of {1, 2, . . . , n} {k bounded). This fits right 
into one of the leitmotifs in |HS] . namely, that a probabilistic proof can be turned 
into a stochastic one. In combinatorics, a probabilistic proof shows the existence of 
an object by showing it appears with positive probability under some probability 
distribution. The idea in [HSj is that, even if we do not have the right to impose a 
probability distribution (in the sense of choosing a random element of the group G) , 
we can mimic a probabilistic proof or implement a probabilistic idea bv following 
a well-chosen random process. It is thus that Babai's splitting lemmso [Bab82j is 
adapted in |HS1 §5], by the use of a random walk as in jBBS04] . The random process 
need not, however, be a random walk on the Cayley graph; an example is the proof 
of the existence of small generating sets in |HSi §4.2] (explained here in §5.4p . 

* * * 

In parallel to the work on permutation groups in the line of |BLS87j , |BBS04j et al. 
— works having their roots in the study of algorithms - there is also an entire related 
area of work coming from probability theory. This area is well represented by the 
text |LPW09j : the emph asis there is in part on mixing times for random processes 
that may be more general than a random walk. See, for example, results expressed in 
terms of card-shuffling, such as the Bayer-Diaconis "seven-shuffle" theorem |BD92| . 

The interest in studying the diameter and the spectral gap of T{{A),A) for A = 
{g, h} C Sym(n), g^ h random^ comes in part from this area. (This is also of interest 
for linear algebraic groups; see [BG08c| .) Here a result of Babai-Hayes |BH05] 
based on |BBS04| shows that, almost certainly (i.e., with probability 1 — o(l) as 
n — ?• oo), the diameter of r((^). A) is polynomial in n. (A classical result of Dixon 
|Dix69| states that {A) is almost certainly Sym(n) or Alt(n).) Schlage-Puchta |SP12] 
improved the bound to 0(n^ logn). 

In upcoming work |HSZ] . Helfgott, Seress and Zuk prove that the diameter of 
T{{A)^A) is in fact n^(log n)*^^"*^^ with probability 1 — o(l); the li mixing time is 
n^(logn)'-^^^). At play is a generalization of some of the analysis in Broder and 
Shamir on random graphs |BS87| as well as part of the procedure in |BBS04) : there 



l^Both |Bab82) and |Pyb93| (also used in ^) had as their aim to provide a partial classification 
of subgroups of Sym(n) avoiding the Classification Theorem. Thus, they turn out to have played a 
role in the study of the diameter of permutation groups very similar to that which, as we will later 
see (i 34.2p . was played by [LP 11] in the study of growth in linear algebraic groups. 
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is some common ground with the ideas in [HSj (discussed here in §5.4|) on generation 
and random walks. 

1.3. Notation. By /(n) <C g{n), g{n) ^ fin) and f{n) = 0{g{n)) we mean the 
same thing, namely, that there are > 0, C > such that |/(n)| < C ■ g{n) for all 
n> N. We write Oa if N and C depend on a (say). 

We write 0*{x) to mean any quantity at most x in absolute value. Thus, if 
/(n) = 0*{g{n)), then f{n) = 0{g{n)) (with iV = 1 and C = 1). 

Given a subset A C X, we let l^i : G — t- C be the characteristic function of A: 

I otherwise. 

1.4. Acknowledgements. The present paper was written in fulfillment of the con- 
ditions for the author's Adams Prize (Cambridge). The author is very thankful to 
the Universidad Autonoma de Madrid for the opportunity to lecture on the subject 
during his stay there as a visiting professor. Notes taken by students during his 
previous series of lectures at the AQUA summer school (TIFR, Mumbai, 2010) were 
also helpful. 

A special place is merited here by A. Granville, whose course and whose notes 
[Graj introduced the author to additive combinatorics in 2005. E. Kowalski, M. 
Rudnev, P. Varjii and E. Vlad must also be thanked particularly for their close 
reading of large sections of the paper. Many thanks are also due to L. Babai, N. 
Gill, B. Green, L. Pyber and P. Spiga for their helpful comments. 



2. Background: arithmetic combinatorics 

The terms "additive combinatorics" and "arithmetic combinatorics" are relatively 
new. To judge from [TV06j . they cover at least some of additive number theory and 
the geometry of numbers. What may be called the core of additive combinatorics 
is the study of the behavior of arbitrary sets under addition (as opposed to, say, 
the primes or fcth powers). In this sense, the subject originated from at least two 
streams, one coursing through work on arithmetic progressions by Schur, van der 
Waerden, Roth [Rot53], Szemeredi [Sze69] , Furstenberg jFur77j . Gowers [GowOl], 
and Green/Tao [GT08j, among others, and another based on the study of growth in 
abelian groups, starting with Freiman |Fre73] . Erdos-Szemeredi |ES83] and Ruzsa. 
There has also been a vein of a more geometrical flavor (e.g., fST83^). 

The use of the term arithmetic combinatorics instead of additive combinatorics 
emphasizes results on growth that do not require commutativity, as well as results 
on fields and rings (the sum-product theorem^ ^2.3p . 

2.1. Triple products and approximate subgroups. Some of additive combina- 
torics can be described as the study of sets that grow slowly. In abelian groups, 
results are often stated so as to classify sets A such that \AA\ is not much larger 
than 1^1; in non-abelian groups, works starting with [Hel08] classify sets A such that 
I AAA] is not much larger than |^|. 
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There is a reason for this difference in conventions. In an abehan group, if \AA\ < 
then \A^\ < i^T'-^^'^^l^l - i.e., if a set does not grow after one multiphcation 
with itself, it will not grow under many. This is a result of Pliinnecke [PliiTO] and 
Ruzsa |Ruz89| . (Petridis [Petj recently gave a purely additive-combinatorial proof.) 
In a non-abelian group G, there can be sets A breaking this rule: for example, if 
H<G, g £ G\H and A = HU{g}, then \ AA\ < 3\A\, but AAA D HgH, and HgH 
can be much larger than A. (For instance, if H is the subgroup of G = SL2(Z/pZ) 
leaving a basis vector ei fixed, and w is the element of G taking ei to 62 and 62 to 
— ei, then HwH is of size We will later see (proof of Prop. 14. 2p that this is 

not an isolated example - it can be quite useful to stick a subgroup H in different 
directions (so to speak) in order to get a large product.) 

However, Ruzsa's ideas do carry over to the non-abelian case, as was pointed 
out in |Hel08j and [TaoOSj : in fact, |RT85j carries over without change, since the 
assumption that G is abelian is never really used. We must assume that I^A^I is 
small, not just \AA\, and then it does follow that \A^\ is small. 

Lemma 2.1 (Ruzsa triangle inequality). Let A, B and G he finite subsets of a group 
G. Then 

(2.1) \AG-^\\B\<\AB-^\\BG-\ 

Commutativity is not needed. In fact, what is being used is in some sense more 
basic than a group structure; as shown in |GHRj . the same argument works naturally 
in any abstract projective plane endowed with the little Desargues axiom. 

Proof. We will construct an injection l : AG^^ x B ^ AB^^ x BC^^. For every 
d G AG~^, choose {fi{d), f2{d)) = {a,c) £ A x G such that d = ac~^. Define 
L{d,b) = (/i(d)6-i,6(/2(d))-i). We can recover d = /i(d)(/2(d))"^ from t(d,6); 
hence we can recover (/i,/2)(d) = (a, c), and thus b as well. Therefore, i is an 
injection. □ 

It follows easily that 

. , \(AuA~^U{e}f\ { \A-A-A\'^^ 

(2.2) ^ ' < 3- ' 



1^1 - V 1^1 

for any finite subset A of any group G, and, moreover 
(2.3) W < ' ' ' 



\A\ - \ \A 

for any A <Z G such that A = A^"^ (i-e., A contains the inverse of every el- 
ement in A). (Both of these statements go back to Ruzsa (or Ruzsa- Turjanyi 
|RT85] ). at least for G abelian.) For example, = |^^-^yl||^-i| < 

I^AII^-^A-i^l (by LemmaOwith 5 = ^-1 and C = A-M) and 1^1 < 

= l^^^ll^^l (again by LemmaEj]), implying < 
< (|AAA|/|^|)3; the rest of and is left as an exercise. 
This means that, from now on, we can generally focus on studying when |^^^| 
is or isn't much larger than |^|, assuming, without any essential loss of generality. 
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that A = and e e A. Obviously, we can apply ([23]) to A U A~'^ U {e} after 

applying ^ 

The paper [Tao08] focused on translating several results from additive combina- 
torics to the non-abelian context. In the course of this task, Tao defined what he 
called an approximate group. {Approximate subgroup might be more suggestive, as 
will become clear in ^) A i^-approximate subgroup of a group G is a set j4 C G 
such that 

(a) A = A^^ and e £ A, 

(b) there is a subset X C G such that \X\ < K and A ■ A C X ■ A. 

This is essentially equivalent to the notion of a slowly growing set A (or set of 
small tripling) as one for which |^^^| < a i^-approximate group is a slowly 

growing set (trivially, with K' = K"^) and, for a slowly growing set A with A = A~^ 
and e £ A, the set A^ is a iC*^^^^-approximate subgroup; this was shown by Tao 
[TaoOSt Cor. 3.11], with the essential ingredient being the Ruzsa covering lemma 
r |Ruz99] ^. 

Lemma 2.2 (Ruzsa covering lemma). Let A and B he finite subsets of a group G. 
Assume \A ■ B\ < K\B\. Then there is a subset X C A with \X\ < K such that 
AcX -B- B'K 

Proof. Let {oi, a2, . . . , afc} be a maximal subset of A with the property that the 
cosets OjB, 1 < j < k, are all disjoint. It is clear that k < \A ■ B\/\B\ < K. 
Let X £ A. Since {01,02, . . . ,0^} is maximal, there is a j such that ajB n xB is 
non-empty. Then x G ajBB^^. Thus, the sets ajBB"^ cover A. □ 

Tao also showed that one can classify sets A of small doubling in terms of approx- 
imate subgroups, using the covering lemma as one of the main tools: 

Lemma 2.3. jTaoOSl Cor. 4.7] Let A be a finite subset of a group G. If\A-A\ < K\A\ 
or \A ■ A~^\ < K\A\, then A lies in the union of at most 0{K^^^^) cosets of an 
0{K'-'^^^) approximate subgroup H of size 

2.2. Balog-Szemeredi-Gowers. Flattening lemma (Bourgain-Gamburd). 

The first version of the following result was due to Balog and Szemeredi [BS94]. 
Gowers jGow981 Prop. 12] improved the bounds dramatically, making all depen- 
dencies polynomial; this is needed for our applications. Then Tao showed that 
the proof (which is essentially graph-theoretical) also works in a non-commutative 
setting |Tao08[ §5]. 

First, we need a definition. Its commutative counterpart, the additive energy., is 
very common in additive combinatorics. 

Definition 1. Let G be a group. Let A,BcGbe finite sets. The multiplicative 
energy E{A, B) is 

E{A,B) = l(U * 1b)(5)|' = |{(ai,a2,6i,62) eAxAxBxB: aibi = 0262}!. 
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Clearly, E{A, B) < m.m.{\A\'^\B\,\A\\B\'^). The convolution / * 5 is defined by 

y 

Proposition 2.4 (Non-commutative Balog-Szemeredi-Gowers |Tao08j ). Let G he 

a group. Let A,B C G be finite. Suppose that E{A,B) > \A\^/'^\B\^/^ / K . Then 
there are A' C A, B' C B such that \A'\ > \A\/K, \B'\ > \B\/K and \A' ■ B'\ < 
yP4||S| , where the implied constants are absolute. 

The Balog-Szemeredi-Gowers theorem (for G commutative) already played a mi- 
nor role in [Hel08| ; |BG08bj would later show this was not necessary. What concerns 
us most here is its use for G non-commutative in |BG08cj : Bourgain and Gamburd 
showed how to use Prop. 12.41 to reduce a statement on the "flattening" of measures 
to a statement about the growth of sets (namely, Thm. II. ip . 

Proposition 2.5 ( |BG08cj . "flattening lemma"). Let G be a finite group. Let fi be 
a probability measure on G with fi{g) = fi{g^^) for all g G G. Suppose that 

(2.4) |/i*/i|2 > i^^Vb 

for some K > 0. Then there is a K'^^^^ -approximate subgroup H C G of size 
<C K'^^^^ /\fi\2 and an element g €z G such that fJ.{ILg) ^ K~'^^^\ (The implied 
constants are absolute.) 

Note that ^i{Hg) > i^-^(i) implies (/i * /x)(i/2) > ^{H g) ^i{g-^ H) > K-O(i) 
(since /x(g) = fi{g^^) and H = H~^). 

Proof. Consider first the case of /i = (l/|74|)lyi, where \a is the characteristic func- 
tion of a set A C G (i.e., 1a(5) = 1 if g € A, 1^(5) = if 5 ^ A). Then 

(2.5) |^*Hi = ^^(A^), \i'\l = jx\- 

Thus, (12.41) means i\iaiE{A,A) > K-'^\A\^ . Hence, by Prop. [131 there are ^'1, C 
A such that l^'iMA'al > \A\/K'^ and \A'^A'2\ <. K^^yP4[p4[. By the Ruzsa 
triangle inequality dHJ, \A'^A'^\ < K^^\A'^\. Thus, by Lem. [231 lies m a union 
of ^ iiT*-^'-^^ cosets of an 0{K^^^'>) approximate subgroup H of size < < 
At least one of these cosets Hg must contain ^ K ^^'^'^\A'^\ elements of 
and thus of A. }ience_u{Hg) > R-^^^l 

Now consider the casq^j of general fi. The idea is that (thanks in part to 12. 4p 
the bulk of fi is given by the values fx{g) neither much larger nor much smaller than 
a certain value a; that "bulk" (call it /x^) behaves essentially as a characteristic 
function, thus reducing the situation to the one we have already considered. 

Inspired by the second equation in (12. Sh . we define a = \fi\2, and let A be the 
set of all 5 G G with fi{g) > a/{CK'^), where c, C > will be set later. We let 
IXA = (l/|^|)lyi; we must check that * /Uyib is large relative to \^a\2 = 



We are giving Bourgain and Gamburd's proof with a technical simplification due to Tao 
[Tap] . Wigderson seems to have suggested an analogous simplification (based on an idea already in 
[BIW06] '). 
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First, note that each g ^ A makes, by definition, a contribution of > a^/(C^X^^) 
to hence |^| < C'^K^^a, and so l/\A\ > a/iC^K^"). 

We spht fi = fj,<: + fj,,-^ + /x>, where lJ'<{g) = lJ,{g) when ^{g) < a/{CK^) and 
otherwise, and fJ->{g) = fJ,{g) when iJ:{g) > CK'^a and otherwise. Now | / * g|2 < 
l/bbli for any /, g (Young's inequahty, special case; follows from Cauchy-Schwarz) . 
Hence 

|/^*/^<|2 = |/^<*/^|2 < |/^|l|/^<|2 < 1- \/|At<|oo|/^<|l < 

\^^>*^^\2 = |/^*M>|2 < |^>|l|^|2 < — V / - T^IFT - frTlT-r - 

Thus, we can afford to cut off the tails: we obtain, by (|2.4I) . 
(2.6) |Ai^*/i^|2>i^-^^/^--^=>^K-^x/^, 

where we have set C = 5, c = 2. We are almost done; we now need to go from /i^, 
which is roughly a characteristic function, to [ia^ which is actually a characteristic 
function. 

The inequality (j2.6p enables us to bound 



l/l^l , , l/l^l 1 1 ^ 1 l/l^l 

By l/l^l > a/(C2K2^) = 0/(52^4) and * /U^ll = ^(^, A)/|yl|4 < 1/|^|, this 
implies both 



|/iA*/iA|2 > = and 1^1 > 



53^5 5^2 I I - 54^6 • 

We now have the setup we had at the beginning, only with [ia instead of /i and 
53^5 instead of K. Proceeding as before, we obtain a i^*^(^)-approximate subgroup 
H C K such that ^A{Hg) ^ if for some g ^ G, and so 

^^{Hg) > ^UiHg) = ^f^A{Hg) » R-^^'l 

□ 

2.3. The sum-product theorem. Growth in solvable groups. 

2.3.1. The affine group and the sum-product theorem. The analogue of the following 
lemma had been known for long (Erdos-Szemeredi). The version over finite fields 
is harder, since there is no natural topology or fully natural ordering to work with. 
(Over R, there is a brief and very natural proof |Ele97| based on a result that is 
essentially topological tST83] : the best bound for the sum-product theorem over M 
has a direct proof, also topological |Sol09| .l 

Theorem 2.6 (Sum-product theorem |BKT04j . |BGK06j : see also jEM03j ). For 

any yl C F* with C < \A\ < e > 0, we have 

msix{\A- A\,\A + A\) > \A\^+^ , 
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where C > and 6 > depend only on e. 

The proof was strengthened and simphfied in |TV06] and |GK07) 

The same result holds for F^, q = p"", and indeed for arbitrary fields; we must only 
be careful to specify that A is not concentrated in a proper subfield. The strength 
of this result must be underlined: A is growing by a factor of |^|^, where 5 > is 
moreover independent of p. In contrast, even after impressive recent improvements 
( |Sanl2j : see also |CS10j ). the main additive-combinatorial result for abelian groups 
(Freiman's theorem) gives growth by smaller factors. 

Rather than prove Thm. 12.61 l^t us prove the key intermediate result towards it; 
it is enough for many applications, and it also illustrates the connection between the 
sum-product theorem and growth in solvable groups. The following idea was put 
forward in [Hellll §3.1] and developed there and in later works: the sum-product 
theorem is really a result about the action of a group on another group; in its usual 
formulation (Thm. 12. 6p . the group that is acting is F* (by multiplication), and the 
group being acted upon is F+. 

Let G be the affine group 

(2.7) ^={(o l) ^^eF;,aGFp|. 
Consider the following subgroups of G: 

(2.8) [/=|(J ^):aeF,}, T = { ([, j) - ^ F; 

These are simple examples of a solvable group G, of a maximal unipotent subgroup 
U and of a maximal torus T. Actually, the centralizer C{g) of any element g of G 
not in ±[7 is a maximal torus. 

We look at two actions - that of U on itself (by the group operation) and that of 
T onU (by conjugation; [7 is a normal subgroup of G). They turn out to correspond 
to addition and multiplication in Fp, respectively: 

1 ai\ f 1 _ fl ai + 02 

1 y Vo 1 y vo ^ 

r 0\ A a\ fr-^ ^\ - ™ 

V ■ Vo V ■ V V " Vo 1 

Thus, we see that growth in U (under the actions of U and T) is tightly linked to 
growth in Fp (under addition and multiplication). 

In fact, the result we will prove on these two actions (Prop. [377]) . implies immedi- 
ately the "key intermediate result" we want: 

Proposition 2.7 ([GK07], Corollary 3.5). Let X C Fp, y C F* he given with 
X = -X, G X, 1 G y. Then 

\AYX + 2Y'^X\ > ^min(|X||y|,p). 

We write AX (say) for when X is a subset of an additive group; thus, e.g., 
2Y'^X = Y^X + Y^X. 
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Thm. 12.61 follows from Prop. 12.71 after the application of the Katz-Tao Lemma 
[TV061 §2.8], which plays a role for sums and products analogous to that played by 
Lemma l2. II (Ruzsa) for group operations. 

2.3.2. Solvable and nilpotent groups. Gill and Helfgott |GHj proved growth in all 
solvable subgroups of GL„(Fp), in the sense of Prop. [3^ The main two challenges 
were the existence of elements outside U that are not semisimple (and thus their 
action on U has non-trivial fixed points) and the relatively complicated subgroup 
structure that solvable subgroups of GL„(Fp) can have. The case of solvable groups 
over ¥q remains open; a proof along the lines of |GH] should be feasible but cumber- 
some. (The case of GL„(Fg) does not reduce to GL„(Fp), since that would increase 
the rank n depending on q, and we want results independent on p or q.) As usual in 
this context, infinite fields can be easier if they have a "sensible" topology and/or if 
the subgroup structure is simpler ( |BGllaj . [BGllbj . p3G12j ). The problem also be- 
comes more accessible if, instead of aiming at bounds of the quality > l^dl^^*^, 
we aim at much weaker bounds |TaolOj . since then more tools are admissible. 

Already in G as in (|2.7p . growth-related behavior can be complex. In the above we 
showed that subsets AoiG do grow rapidly under the group operation, outside some 
very specific circumstances. However, the action of G on [/ does not, in general, 
give us expansion. To be precise: identify U with Fp, fix a A G Z"*", and say we have 
e-expansion if, for every S C Fp with l^l < p/2, 

(2.9) |5U(5 + 1)UA5| > (l + e)|5|. 

(Here the addition of 1 can be thought of as coming from the action of U on itself, 
and multiplication by A comes from the action of G/U on U by conjugation.) Now, 
the spectrum of the discrete Laplacian for the Schreier graplJ^ T given by x i— t- x + 1, 
X —7- Ax is given in |MVOO| : the non-existence of a spectral gap implies, in particular, 
that there is no fixed e > such that (12. 9p holds for all S C Fp with \S\ > p/2 and 
all (sufficiently large) p. 

J. Cilleruelo points out that one can prove this directly by modifying a construc- 
tion by G. Fiz Pontiveros [Fizj . based in turn on an idea of Rokhlin's [R ok63j : let 
/ be the reduction of {0 < n < ep/3} modulo p, and let (j) ■ Z/pZ TLjpTL be the 
multiplication-by-A map; define 

o<j<| 

Then \S\ ~ p/3 and |5 U (5 + 1) U A5| < (1 + ^\S\ for any p larger than a constant 
depending only on e. 

Can one make the somewhat weaker statement that the diameter of the Schreier 
graph r is small? For A G this is very easy: writing x in base A, we obtain 
that the diameter if ©^(logx). If we want a bound independent of A for A G F* 
arbitrary, the problem is subtler. (We must impose the condition that the order of 

^"^A Schreier graph is given by a set of actions ^4 on a set X (here X — Fp); the set of vertices 
is X, and the set of edges is {{x, a{x)) : x £ X,a £ A}. A Cayley graph is thus a special case of a 
Schreier graph: A G X, A acts on X by multiplication. 
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A not be too small, or else what we try to prove could be false.) Using a result of 
Konyagin |Kon92j . it is possible to show that the diameter is 0((log where 
both implied constants are absolute!^ 

Nilpotent groups. The case of nilpotent groups is fairly close to the special case 
of abelian groups. Here Fisher-Katz-Peng [FKPlOj and Tao [TaolOj laid out the 
groundwork; the recent preprint |Toi] contains a general statement. In summary — 
tools that yield Preiman's theorem over abelian groups (also called Freiman-Ruzsa 
in that generality) can be adapted to work over nilpotent groups. On the flip side, 
bounds are quantitatively weak and necessarily conditional on the non-existence of 
significant structure (progressions), just as in the abelian case. 

Pinally, let us take a brief look at the asymptotics of growth in infinite solvable 
and nilpotent groups. Here some of the main results are |Wol68] . [Mil68| . |Bas72j . 
[Qui 73] : in summary, a set of generators ^ of a solvable group G has polynomial 
growth (i.e. |^*^| <C l^l*^*-^)) if and only if G has a nilpotent subgroup of finite index. 
(When G is not assumed to be solvable, the "only if" direction becomes very hard; 
this is due to Tits for linear groups (a consequence of the "Tits alternative" [Tit72j ) 
and to Gromov for general groups (the celebrated [Gro81| ).) 



3. Group actions: stabilizers, orbits and pivots 

3.1. The orbit-stabilizer theorem for sets. A leitmotif recurs in recent work on 
growth in groups: results on subgroups can often be generalized to subsets. This is 
especially the case if the proofs are quantitative, constructive, or, as we shall later 
see, probabilistic. 

The orbit-stabilizer theorem for sets is an example both paradigmatic and basic; 
it underlies a surprising number of other results on growth. It also helps to put 
forward a case for seeing group actions, rather than groups themselves, as the main 
object of study. We state it as in [HS, §3.1], though it is already implicit in [HelOS] 
(and clear in [Helllj ). 

We recall that an action G — )• X is a homomorphism from a group G to the group 
of automorphisms of a set X. (The automorphisms of a set X are just the bijections 
from X to X; we will see actions on objects with richer structures later.) Por A C G 
and X £ X, the orbit Ax is the set Ax = {g ■ x : g G A}. The stabilizer Stab(x) C G 
is given by Stab(x) = {g £ G : g ■ x = x}. 

(Permutation group theorists prefer to use actions on the right; they write x^ for 
g{x), Gx for Stab(x), and use right cosets by default. We will use that notation in 
^ where we will also write x"^ instead of Ax, in consequence.) 

Lemma 3.1 (Orbit-stabilizer theorem for sets). Let G be a group acting on a set 
X. Let X €z X , and let A <^ G be non-empty. Then 

(3.1) |(A-i^)nStab(x)| > 



This was the outcome of a discussion among B. Bukh, A. Harper and the author. We thank 
E. Lindenstrauss for referring us to Konyagin's paper. 
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Moreover, for every B (IG, 

(3.2) \BA\>\Ar\^tah{x)\\Bx\. 

The usual orbit-stabilizer theorem is the special case A = B = H , H a subgroup 
of G. 



Sketch of proof. Exercise: (I3.ip is proven by pigeonhole, (13. 2p by counting. □ 

Let H he a subgroup of G. The following lemmas are all direct consequences of 
the above for the natural action G — )■ X = G/H defined by group multiplication. 
(Set X equal to H, the equivalence class of the identity in G/H.) Lemma [3^2] gives 
us elements in a subgroup of G; Lemmas l3.3H3.4l tell us that, to obtain growth in a 
group, it is enough to obtain growth in a subgroup or in a quotient. 

Lemma 3.2. |Helll[ Lem. 7.2] Let G be a group and H a subgroup thereof. Let 
A C G be a non-empty set. Then 

(3.3) \AA-^nH\ > 



r 



where r is the number of cosets of H intersecting A. 

Lemma 3.3. [HSj Lem. 3.5] Let G be a group and H a subgroup thereof. Let A C G 
be a non-empty set with A = A~^ . Then, for any k > 0, 

(3.4) M'+'l > {|^|.4|. 

Lemma 3.4. [HellH Lem. 7.4] Let A (1 G be a non-empty set with A = A ^ . Then, 
for any k > 0, 

Hints for Lemmas \3.2[\3.4\ Let G ^ X = G/H be the natural action by multipli- 
cation; let X G X be the equivalence class of the identity (i.e., H). For Lem. 13.31 
use first (f3T|) . then (with A^ instead of A and A instead of B for ([321) )• For 
Lem. [331 use first (132]) (with A^ instead of A and A'^ instead of B), then ([3J]). □ 

In the above, as is often the case, the assumption A = is inessential but 
convenient from the point of view of notation. (Obviously, if ^ is a set not fulfilling 
A = we can apply the lemmas to j4 U A~^ rather than to A.) 

As far as the orbit-stabilizer theorem (Lemma 13. ip is concerned, the action of G 
on itself by multiplication is dull - all stabilizers are trivial. However, the action of 
G on itself by conjugation is rather interesting. Write Cnig) for the centralizer 

Cnig) = {h € H : hgh-^ = g} 

and Cl{g) for the conjugacy class 

Cl{g) = {hgh-^ -.heG}. 

We can write G{g) as short for Ccig)- 
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Lemma 3.5. Let A C G be a non-empty set with A = A ^. Then, for every g G A'- , 
I > 1, 



\A'nC{g)\> 



|A'+2nci(5)| 



Proof. Let G — )• G be the action of G on itself by conjugation. Apply (|3.ip with 
X = g; the orbit of g under conjugation by A is contained in ^'"'"^ n Cl(g) □ 

3.2. A pivoting argument: the afRne group. We will now see how to obtain 
growth in the affine group (|2.7|) . The main ideas in the proof of Prop. 13.71 below 
were extracted in [Helll] from the proof of the sum-product theorem in [GKOTj. In 
[Hell 11 §3] and then in |GH| . a similar strategy was shown to work for more general 
solvable groups. The theme of pivoting will recur in ^4.4[ 

First, let us see how to construct many elements in U and T starting from A. 



Lemma 3.6. Let G he the affine group overWp (2.1). Let U be the maximal unipo- 
tent subgroup of G, and tt : G ^ G/U the quotient map. 

Let A C G, A = A~^ . Assume A <f_ ±U; let x be an element of A not in ±U. 
Then 

(3.5) |^2n[/|>_i^, \A'nT\>l^^\7r{A)\ 

forT = C{x). 

Recall U is given by (12. Sp . Since x ib?7, its centralizer T = G{x) is a maximal 
torus. 

Proof. By Lemma 13.21 A^ := ^"^^4 n U has at least |^|/|7r(A)| elements. Consider 
the action of G on itself by conjugation. Then, by Lemma l3.H n Stab(2;)| > 

|A|/|^(x)|. (Here A{x) is the orbit of x under the action (by conjugation) of ^4.) We 
set At := (^-iA)nStab(x) C T. Clearly, \A{x)\ = \A{x)x-^\ and {Ax)x-^ C A^nU, 
and so |^(2;)| < |^^nC/|. At the same time, by (13. 2p applied to the action G — ^ G/U 
by left multiplication, |^^| = \A^A\ > \A^ D U\ ■ \Tr{A)\. Hence 

□ 

As per previous notation, A^ = At ■ At, At[Au) = {ti{ui) : ti G At, ui G Au} and 
t{u) = tut~^ (that is, T acts on U by conjugation) . 

Proposition 3.7. Let G be the affine group over Fp, U the maximal unipotent 
subgroup of G, and T a maximal torus. Let A^ C U , At C T. A SSU7T16 jA^ii — jAy^ J 
e E Af^Au and Au / {e}. Then 

\{A^tiAu)f\ > \At{Au)A^,{A^)AtiAl)A^{AMAu)\ > ^ min(|A„||At|,p). 
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Proof. Call a G U a pivot if the function (pa : x At ^ U given by 

{u,t) ut{a) = utat~^ 

is injective. 

Case (a): There is a pivot a in Au- Then \(j)a{Au, At)\ = \Au\\At\, and so 

\AuAt{a)\ > \MAu,At)\ = \Au\\At\. 

This is the motivation for the name "pivot" : the element a is the pivot on which we 
build an injection (pa, giving us the growth we want. 

Case (b): There are no pivots in U. Choose the most "pivot-like" a E U, meaning 
an element a e U such that the number of collisions 

«a = \{U1,U2 G Au,ti,t2 G At : (f)a{ui,ti) = (f)a{u2,t2)}\ 

is minimal. Two distinct (ui,ti), {u2,t2) collide for at most one a € [/ \ e - in fact, 
for no a £ U \ e when ui = U2, ti ^ t2 or ui U2, ti = t2- Hence the total number 
of collisions Y^a^a is < \Au\\At\{p-l) + - - 1), and so 

, ^ \A.\\At\{p - 1) + \A^\{\A^\ - l)\At\{\At\ - 1) ^ I , ||, I , MAt? 

p — 1 p 
Cauchy-Schwarz implies that |(/)a(yl„, > and so 

|^^|2|^ 12 I J 

\M^u,At)\ > " lA^\2\A,\2 = 1 1 ^ ^T^H\Au\\At\,p). 

\Au\\At\-{ ^ \Au\\At\ ~^ p 

We are not quite done, since a may not be in A. Since a is not a pivot (as there are 
none), there exist distinct {ui,ti), {u2,t2) such that (pa{ui,ti) = (pa{u2,t2)- Then 
ti 7^ t2, and so the map tpti,t2 - U ^ U given by n — >■ ti{u){t2{u))~^ is injective. For 
any u e U, t e T, since T is abelian, 
(3.6) 

V-ti.ta (</>«(«,*)) = ti{u)tiitia)){t2{u)t2itia)))-^ = ti{u)titi{a)it2{a))'^)it2iu)r^ 
= ti{u)t{i>t,,t2{a)){t2iu))-^ = tiiu)t{u^\2)it2{u))-\ 
(Note that a has just disappeared.) Hence, 

i^t^,tMAu,At)) C At{A^)At{Al)At{A^) C {At{Au))\ 
Since ipti,t2 is injective, we conclude that 

\{At{A^))^\ > \i^t,,t2{MAu,At))\ = \MAu,At)\ >^mm{\Au\\At\,p). 

There is an idea here that we are about to see again: any element a that is not 
a pivot can, by this very fact, be given in terms of some u\,U2 G Au, ti,t2 G At, 
and so an expression involving a can often be transformed into one involving only 
elements of A^ and Af. 

Case (c): There are pivots and non-pivots in U. Since A^ / Au generates 

U. This implies that there is a non-pivot a e U and a g e Au such that ga is a 
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pivot. Then (j)aq '■ x j4( — )• [/ is injective. Much as in (13.6 



-1 



Vti,t2(0ga(n,t)) = ti{u)ti{t{g))ti{t{a)){t2{a)t2{t{g))t2{t{a))) 
= h{uMt{g))t{u^\2){t2{t{g)))-\t2{u))-\ 

Hence 

\At{Au)A^t{Au)At{Al)A^t{Au)At{Au)\ > \i't„t2{^ga{u,t))\ = \Au\\At\. 

The idea to recah here is that, if S" is a subset of an orbit ^ = {A)x such that 
5/0 and S ^ ff, then there is an s G 5 and a g £ A such that gs S. In other 
words, we use the point at which we escape from 5. □ 

We are using the fact that G is the affine group over Fp (and not over some other 
field) only at the beginning of case (c), when we say that, for A^ C U, A^ ^ {ie} 
implies (Au) = U. 

Proposition 3.8. Let G be the affine group overFp. Let U be the maximal unipotent 
subgroup of G, and vr : G — )• G/U the quotient map. 

Let Ac G, A = A-^, e £ A. A ssume A is not contained in any maximal torus. 
Then either 

1 



(3.8) \A''\>-.M^\-\A\ 



or 



(3.9) \A^^\ > ^\n{A)\p and U C A^^'^ . 

Proof. We can assume A (f_ itf/, as otherwise what we are trying to prove is trivial. 
Let g be an element of A not in ±[7 ; its centralizer C{g) is a maximal torus T. By 
assumption, there is an element h oi A not in T. Then hgh^^g~^ ^ e. At the same 
time, it does lie in A^ n U, and so A^ CiU is not {e}. 

Let Au = A^nU, At = A^nT; their size is bounded from below by (13. 5p . Applying 
Prop. 13.71 we obtain 

1 1 / \A\ ^ 

\A^^nU\ > -mm{\Au\\At\,p) > - min ( ^ • \A\,p 



By |^57| > 1^56 p f/| . |^(^)|_ Clearly, if \A\/\A^\ < l/v^^f(A)|, then \A^'^\ > 

\A'\>.M^\-\A\. □ 



The exponent 57 in (j3.8p is not optimal, but, qualitatively speaking. Prop, 
is as good a result as one can aim to for now: the assumption A T is necessary, 
the bound ^ |7r(A)| • p can be tight when U C A. For A C U, getting a better- 
than-trivial bound amounts to Freiman's theorem in Fp, and getting a growth factor 
of a power |j4|'' (rather than y^|7r(^)| would involve getting a version of Freiman's 
theorem of polynomial strength (a difficult open problem). 

Incidentally, (j3.8p can be seen as a very simple result of the "classification of 
approximate subgroups" kind: if a set A grows slowly (l^'^l < j^l^"'"'', k = 57, 5 
small) then either A is contained in a subgroup (a maximal torus) or A is almost 
contained in a subgroup {U, with "almost contained" meaning that |vr(A)| < |^|'') 
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or contains a subgroup {H = U) such that {A)/H is nilpotent (here, in fact, 
abehan). 

A result of this kind was what |GH] proved for solvable subgroups and what 
[Helll| proved for SL3(Fp); that is to say, one can try to classify growth in general 
linear algebraic groups, leaving only the nilpotent case aside. This was called the 
"Helfgott-Lindenstrauss conjecture" in |BGT12j . which proved it in an impressively 
general but quantitatively very weak sense. In particular, [BGT12j does recover 
a proof of Gromov's theorem (close to Hrushovski's), but it does not seem strong 
enough to give useful bounds for finite groups. 

* * * 

The use of pivoting for general groups was first advocated in [Helll| , but it came 
to full fruition only later, partly thanks to [BGTl l] and [PSa]: due in part to what 
in retrospect was a technical difficulty (see the remarks at the end of §4.2p . [Helll] 
still uses a sum-product theorem at a certain point, though it does develop in [Helllt 
§3] the more abstract setting that we have demonstrated here in the simplest case. 

4. Growth in linear algebraic groups 

Here we will go over an essentially complete and self-contained proof of Thm. 11.11 
The proof we will give is somewhat more direct and easier to generalize than that 
in |Hel08j : it is influenced by [Helllj . [EGTllj . |PSaj . and also by the exposition in 
Kowbj . The b asic elements are, however, the same: a dimensional estimate gives us 
tori with many elements on them, and, aided by an escape lemma, we will be able 
to use these tori to prove the theorem by contradiction, using a pivoting argument 
(indirectly in |Hel08] . directly here). The proof of the case SL2 will be used to anchor 
a more general discussion; we will introduce the concepts used in the general case, 
explaining them by means of SL2 . We will actually prove Thm. 11.11 for a general 
finite field Fg, since we have no longer any use for the assumption that q be prime. 

We will then show how Thm. 11.11 and Prop. 12.51 imply Thm. 11.21 (Bourgain- 
Gamburd) . 

4.1. Escape. At some points in the argument, we will need to make sure that we 
can find an element g ^ that is not special: for example, we want to be able to 
use a g that is not unipotent, that does not have a given v as an eigenvector, that 
is regular semisimple (i.e., has a full set of distinct eigenvalues), etc. As [BGTllj 
states, arguments allowing to do this appear in several places in the literature. The 
first version of [HelOSj did this "by hand" in each case, and so does [Kowb j ; that 
approach is useful if one aims at optimizing bounds, but our aim here is to proceed 
conceptually. The following general statement, used in [Helllj . is modelled very 
closely after [EMU051 Prop. 3.2]. 

Lemma 4.1 (Escape). Let G be a group acting linearly on a vector space V/K, K a 
field. Let W be a subvariety of V all of whose components have positive codimension 
in V. Let A C G, A = A~^ , e G A; let x ^ V be such that the orbit (A) ■ x of x is 
not contained in W . 
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Then there are constants k, c depending only the number, dimension and degree 
of the irreducible components ofW such that there are at least max(l,c|A|) elements 
g ^ a'' for which gx ^ W . 

In other words, if x can escape from W at all, it can escape from Vl^ in a bounded 
number of steps. 

Proof for a special case. Let us first do the special case of W an irreducible linear 
subvariety. We will proceed by induction on the dimension of W. If dim(VK) = 0, 
then W consists of a single point, and the statement is clear: (A) ■ x <^ {x} implies 
that there is a 50 £ ^ such that g^x 7^ x; if there are fewer than |^|/2 such elements 
of A, any product g'^go with gx = x satisfies g~^gox 7^ x, and there are > \A\/2 
such products. 

Assume, then, that dim(H^) > 0, and that the statement has been proven for all 
W with dim(I^') < dim(H^). If gW = W for all g e A, then either (a) gx does 
not lie on W for any g £ A, proving the statement, or (b) gx lies on W for every 
g € {A), contradicting the assumption. Assume that gW 7^ W for some g G A; then 
W' = gW n is an irreducible linear variety with dim(VF') < dim(VF). Thus, by 
the inductive hypothesis, there are at least max(l,c'|A|) elements g' G A^ (c', k' 
depending only on dim(W)) such that g'x does not lie on W' = gW H W. Hence, 
for each such g' , either g^^g'x or g'x does not lie on W . We have thus proven the 
statement with c = c'/2, k = k' + \. □ 

Adapting the proof to the general case. Remove first the assumption of irreducibil- 
ity; then W is the union of r components, not necessarily all of the same dimension. 
The intersection W' = gW H W may also have several components, but no more 
than r^. Let Wi be a component of W of maximal dimension d. By the argument in 
the first sketch, we can find a g £ A such that gWi 7^ Wi. (If gx does not lie on Wi 
for any g £ A, we simply remove Wi from W and repeat.) Hence W = gW CiW has 
fewer components of dimension d than W does. We can thus carry out the induction 
on (a) the maximum of the dimensions of the components of W, (b) the number of 
components of maximal dimension: when (a) does not go down, it stays the same 
and (b) goes down; moreover, the number of components of lower dimension stays 
under control, as the total number of components r gets no more than squared, as 
we said. 

Removing the assumption that V is linear is actually easy: the same argument 
works, and we only need to make sure that the total number of components (and 
their degree) stays under control; this is so by Bezout's theorem (in a general form, 
such as that in |DS98l p. 251] (Fulton-MacPherson)). □ 

As Pyber and Szabo showed in [PSaj . one can merge the "escape" argument above 
with the "dimensional estimates" we are about to discuss, in that, in our context, 
an escape statement such as Lemma 14.11 is really a weak version of a dimensional 
estimate: Lemma |4. II tells us that many images gx escape from a proper subvariety 
W, whereas a dimensional estimate tells us that, if A grows slowly, very few images 
gx, g G A'^, lie on a proper subvariety W C G. We will, however, use Lemma \4A\ as 



GROWTH IN GROUPS: IDEAS AND PERSPECTIVES 



23 



a tool to prove dimensional estimates and other statements, much as in Helllj (or 
[BGTII]). 

4.2. Dimensional estimates. By a dimensional estimate we mean a lower or upper 
bound on an intersection of the form n W, where A C G{K), is a subvariety 
of G and G/K is an algebraic group. As the reader will notice, the bounds that we 
obtain will be meaningful when A grows relatively slowly. However, no assumption 
on A is made, other than that it generat G{K). 

Let us first look at a particularly simple example; we will not actually use it as such 
here, but it was important in |Hel08| and |Helll| . and it exemplifies what is meant 
by a "dimensional estimate" and one way in which it can be proven. (Moreover, its 
higher-rank analogues do come into generalizations of what we will do to SL„ and 
other higher-rank groups, and the ideas in its proof will be reused for Prop. 



Proposition 4.2 f; |Hein8[ Lem 4.7]; [Hellll Cor. 5.4], case n = 2). Let G = SL2, 
K a field. Let A C G{K) he a finite set with A = A~^, e ^ A. Let T be a maximal 
torus of G. Then 

(4.1) \Ar\T{K)\^\A^\^/^, 
where k and the implied constant are absolute. 

A maximal torus, in SL2 (or SL„), is just the group of matrices that are diagonal 

2 

with respect to some fixed basis of K . Here G{K) simply means the "set of K- 
valued points" of G, i.e., the group SL2(-fC). (In general, according to standard 
formalism, an algebraic group is an abstract object (a variety plus morphisms); its 
set of ii'- valued points is a group.) 

The meaning of 1/3 in (|4.ip is that it equals dim(T)/dim(G). This will come 
through in the proof: we will manage to fit three copies of T inside G in, so to 
speak, independent directions. 

Proof, as in jHelllj . It would be enough to construct an injective map 

(j) : T{K) X T[K) x T[K) G{K) 
such that (l){T{K) n A, T{K) n A, T{K) n A) C A^ , since then 
\T n = \{T{K) r\A)x {T{K) n A) X {T{K) n ^)| 

^^■^^ = \(i){{T{K) n A) X {T{K) r^A)x {t{k) r\A))\< \A\ 

It is easy to see that we can relax the condition that (j) be injective; for example, 
it is enough to assume that every preimage 4'~^{9) have bounded size, and even then 
we can relax the condition still further by requiring only that (p^^ig) H {T(K) \ S*)^ 
be of bounded size, where IS*! is itself bounded, etc. Let us first construct cp and 
then see how far we have to relax injectivity. 



^'''Even this can be relaxed to require only that (A) not be contained in the union V of a 
bounded number of varieties of positive codimension and bounded degree, as is clear from the 
arguments we will see and as |BGT11| states explicitly. This boundedness condition is called 
"bounded complexity" in [BGTll] . The "complexity" in [BGTllj corresponds to the degree vector 
d^{V) in [Helllj . 
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We define (p : T{K) x T{K) x T{K) G{K) by 
(4.3) (/)(to, ti, ^2) = to • QitQi^ ■ 92t2g2^, 

where 51,52 £ ^'^'j fc' = 0(1) are about to be specified. It is easy to show that 
there are g'i,g2 £ G{K) such that v, g'lVg'^^^ and g^'^^g'-^^ are linearly independent, 
where v is a non-zero tangent vector to T at the origin. Now, the pairs (51,52) ^ 
G{K) X G{K) for which givg'{~ , 52^52" are not linearly independent form a 
subvariety of G x G of bounded degree (given by the vanishing of a determinant). 
Since G x G is irreducible, and since we have shown that there is at least one point 
(5']^, 52) outside W, we see that all the components of W have to be of positive 
codimension. Hence we can apply Lemma |4. II (escape) with Ax A (which generates 
G X G) instead of A and G x G instead of G and V, and obtain that there are 
5i)52 £ a''' , k' = 0(1), such that v, givg^^ and 52^52^^ are linearly independent. 
This means that the derivative of <j) at the origin (e, e, e) ofTxTxT is non-degenerate 
when any such 51,52 G A^ are given. 

The points of T x T x T at which (p has degenerate derivative form, again, a 
subvariety Wq; since T x T x T is irreducible, and since, as we have just shown, 
the origin (e, e, e) does not lie on Wq, we see that Wq is a union of components of 
positive codimension. This means that there is a subvariety Wi C T x T of bounded 
degree (bounded, mind you, independently of 51 and 52), made out of components 
of positive codimension, such that, for all {to,ti) £ T{K) x T{K) not on Wi^ there 
are 0(1) elements t2 G T{K) such that {to,ti,t2) lies on Wq; we also see that there 
is a subvariety W2 C T of bounded degree and positive codimension such that, for 
all to G T{K) not on W2, there are 0(1) elements ti G T{K) such that {to,ti) lies 
on Wi. 

Given any point y on G{K), its preimage under the restriction 4'\(txTxT)\Wo 
lies on a variety of dimension zero: if this were not the case, the preimage 0~^(y) 
would be a variety Vy such that there is at least one point x not on Wq lying on 
a component of positive dimension of Vy. There would then have to be a non-zero 
tangent vector to Vy at x, and we see that its image under D(j) would be 0, i.e., D(j) 
would be degenerate at x, implying that x lies on Wq; contradiction. 

The preimage of y under 4'\{TxTxT)\Woj besides being zero-dimensional, is also of 
bounded degree, because 4> is of bounded degree. Hence the preimage consists of at 
most G points, G a constant. 

Similarly, considering the boundedness of the degrees of Wq , Wi and W2 , we see 
that there are at most 0(1) points on W2, there are at most \An T{K)\ • 0(1) 
points {to,ti) £ {AnT{K)) x {AnT{K)) on l^i for to not on W2, and there are at 
most |AnT(K)|2 .0(1) points (to,ti,t2) G {AnT{K)) x {AnT{K)) x {AnT{K)) 
on Wo for (to)*i) not on Wi. Hence 

\{xeX -.x^ Wo{K)}\ >\An T{K)\^ - 0{\A n T{K)\^) 
for X = {Ar\ T{K)) x{Ar\ T{K)) x (A n T{K)), and so 

.X:.i ,^„(A-),|) > l-4nn«)l»-0(l^nT(A-)|^) 
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Since (l){X) lies in A'^ for k = 2k' + 3, we see that 

1^ n T{K)\^ - 0{\A n T{K)\^) < C\A\ 
and so |Anr(if)|3 < |yl'^|V3. □ 

The same proof works for SL„ and, indeed, for all classical Chevalley groups (at 
least) [HellH Cor. 5.4]. The proof above is more conceptual than that of [HelOSj 
Lem. 4.7] (a computation), and thus generalizes more easily: [Helllj carries out the 
same argument for non-maximal tori |Helll[ §5.5] and unipotent subgroups |Helll[ 
§9.1]. The main step was stated in general terms in [ Hell 11 Prop. 4.12]. 

However, the situation was still not fully satisfactory. There is one passage in 
the proof of Prop. [42] that isn't quite abstract enough: the one that starts with "It 
is easy to show". This is a very simple computation for SL2, and fairly easy even 
for SL„. The problem is that it has to be done from scratch for a specific algebraic 
group G and a specific variety V every time we want to generalize Prop. 14.21 This 
means, first, that we have to go through Lie group types if we want a statement 
that is general on G, and that is tedious. Second, and more importantly, this kept 
the author from giving a statement for fully general V in jHelllj . as opposed to a 
series of statements for different varieties V. 

A full generalization in these two senses was achieved independently by [PS a ] and 
[BGTll] . It turns out that all we need to know about the algebraic group G is 
that it is simple (or almost simple, like SL„). Under that condition, and assuming 
that {A) = G{K), Pyber and Szabo proved |PSal Thm 24] that, for every subvariety 
V C G of positive dimension and every e > 0, 

(4.4) \A n V\ |A^|(i+^)dta(Gy^ 

where k and the implied constant depend only on e, on the degree and number of 
the components of V and on the rank (and Lie type) of G. This they did by greatly 
generalizing and strengthening the arguments in [Helll| (such as, for example, the 
proof of Lemma 14.21 above). 

The route followed by the authors of [BGTllj was a little different. By then 
[Helllj was known - the first version was made public in 2008 - and the author had 
conversed at length with one of the authors of [BGTllj about the ideas involved 
and the difficulties remaining. The preprint of Larsen-Pink [LPllj was available, as 
were works by Hrushovski- Wagner [HW08 ] and Hrushovski ; Hrul2pl . The aim of 
[LPllj was to give a classification of subgroups of GL without using the Classification 
Theorem of finite simple groups. This involved stating and proving (14. 4p (without 
e) for A a subgroup of G{K) [LPlll Thm. 4.2]. This proof turned out to be robust 
(as Hrushovski and Wagner's model-theoretical work may have already indicated): 
[BGTllj adapted it to the case of ^ a set, obtaining (|4.4p for every A with {A) = 
G{K) (without the e, i.e., strengthened). 

^^Pyber and Szabo also mention the earlier paper [HP95| (Hrushovski-Pillay) as an influence. 
Part of the role of [Hrul2) was to make the work of Larsen-Pink clearer. According to [LPlll 
p. 1108], there was actually a gap in the original version of [LPllj . and [Hrul2] filled that gap, 
besides giving a more general statement. 
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Rather than prove (14. 4|) here in full generality, we will prove in a case in which we 
need it. This is a case that, for SL2, would have been accessible by an approach even 
more concrete than that in |Hel08j ■ as [Kowbj shows. However, the more conceptual 
proof below is arguably simpler, and also displays the main ideas in the proof of the 
general statement (14. 4p . 



Proposition 4.3. Let G = SL2, K a field. Let A C G{K) he a finite set with 
A = A~^ , e ^ A. Let g be a regular semisimple element of G{K). Then 

(4.5) |^nCl((7)| < l^'^p/s, 

where k is an absolute constant. 

For G = SL2, g is regular semisimple if and only if it has two distinct eigenvalues. 
In that case, Cl{g) is just the subvariety W oi G defined hy W = {x : tr(x) = tr:{g)}. 
(In general, in any linear algebraic group G, the conjugacy class of a semisimple 
element is a (closed) variety |Spr86[ Cor. 2.4.4].) Thus, dim(Cl(g))/ dim(G) = 2/3; 
this is the meaning of the exponent in (|4.5p . The centralizer of a regular semisimple 
element is a torus. 

The proof below is a little closer to [LPllj {apud jBGTllj ) than to [PSa] . and 
also brings in some more ideas from [Helllj . In both [LP 11] and [PSaj . recursion is 
used to reduce the problem to one for lower-dimensional varieties (not unlike what 
happens in the proof of Lem. l4.ip F^ 



Proof of Prop. Write Y for the variety C\{g). We start as in the proof of 
Prop. 14.21 defining a map (j) '.Y xY ^ G hy 

(4.6) 4'{yo,yi) = yoyi- 

(We do not bother to conjugate as in (14. 3p because Y is invariant under conjugation; 
it is also invariant under inversion.) The preimage of a generic point of G is not, 
unfortunately, 0-dimensional, since dim(y x Y) = 2 ■ 2 > 3 = dim(G). Let g £ 
(j){Y X Y). The preimage of g is 

(4.7) {(yo^yo^g) ■ yo (^Y,y^^g eY} = {iyo,yQ^g) -.yoeYDgY}. 

It is clear from this that the dimension of the preimage of g equals dim{Y DgY), and 
so there are at most two points with 2-dimensional preimage, namely, g = ite E G. 
Assume g ^ zize. 

By the usual | domain | > |image| /[largest preimage | argument, (|4.7p implies 

mim.y.) € (Any) x (AnY) . + i,-}) > '^-y 

and, since (/'({(yo,yi) ^ {Ar\Y) x {Ar\Y)}) C A^ , we see that 



(4.8) \Ar\Y\ <2 + J\A'^\-um^\Ar\Yr\gY\. 



^^The e in (|4.4p appears in [PSa] because the recursion there does not always end in the zero- 
dimensional case; rather, an excess concentration on a variety gets shuttled back and forth ("trans- 
port", [PSal Lem. 27]) and augmented by itself a bounded number of times, until it is too large, 
yielding a contradiction and thereby proving (|4.4p . 
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Hence, we should aim to bound | Any n^yj from above. The number of components 
of YCigY is 0(1) because the degree of Y and gY is bounded; let Z be the irreducible 
component of YCigY containing the most elements of A. Since g ^ ite, dim(Z) < 1; 
we can assume that dim(Z) = 1, as otherwise what we wish to prove is trivial. We 
want to bound ACiZ. (This is the recursion we referred to before; we are descending 
to a lower-dimensional variety Z.) 

We will now consider a map ip: ZxZxZ^G given by 

Tpizo, zi,Z2) = zo • gizig^'^ • g2Z2g2^ ■ 

Much as in the proof of Prop. 14.21 we wish to show that there are gi,g2 £ A^' such 
that, at all points oi Z x Z x Z outside a proper subvariety Wq oi Z x Z x Z, the 
derivative of ip is non-degenerate. Just as before, it will be enough to find a single 
point of Z X Z X Z at which the derivative is non-degenerate. Choose any point zq 
on Z; we will look at the point {zq, zq, zq). 

We write 3 for the tangent space of Zzq^ at the origin; it is a subspace of the 
Lie algebra of G. First, we compare the vector space 3 (obtained by deriving 
il^{z, Zq, zo)tlj{zo, Zo, zo)~^ &t z = Zq) and the vector space zogi^g^^z^^ (obtained 
by deriving 'ip{zo, z, zo)'>p{zo, zq, zo)~^ at z = zq). We would like them to be linearly 
independent, i.e., intersect only at the origin; this is the same as asking whether 
there is a g G G{K) such that 3 and g^g~^ are linearly independent, since we can set 
gi = ZQ^g. If there were no such g, then g}g~^ = 3 for all g G G{K), and so 3 would 
be an ideal of the Lie algebra g of G (i.e., it would be invariant under the action ad 
of the Lie bracket). However, this is impossible, since a simple (or almost simplj^ 
group of Lie type G has a simple Lie algebra (prove as in |Bou72t III.9.8, Prop. 27]), 
i.e., an algebra q without ideals other than itself and (0). Hence there is a gi such 
that 3 and zogi^gi^ZQ^ are linearly independent. This means that a determinant 
does not vanish at gi; thus we see that 3 and zogi^g^^ Zq^ are linearly independent 
for gi outside a subvariety W that is a union of (a bounded number of) components 
of positive codimension (and bounded degree). If \K\ is larger than a constant, a 
simple bound on the number of points on varieties (weaker than either Lang- Weil 
or Schwartz-Zippel; |LW54l Lem. 1] is enough) shows that W{K) 7^ G{K); we can 
assume that |X| is larger than a constant, as otherwise the statement of the propo- 
sition we are trying to show is trivial. By escape from subvarieties (Lem. 14. ip . it 
follows there is a gi ^ A^ , k = 0(1), lying outside W; let us fix one such gi. 

We also want 

{xo9ixogi^)g2W2^{xo9ixogi^y^, 

obtained by deriving ip{zo, zq, z)tlj{zo, zq, zq)^^ at z = zq, to be linearly independent 
from 3 -|- xogi^g^^XQ^ . This is done by exactly the same argument: 3 -|- xogi^g^^XQ^ 
cannot be an ideal of g (because there are none, other than g and (0)) and so there 
is a subvariety W C G (the union of a bounded number of components of positive 
codimension and bounded degree) such that the linear independence we want does 



Meaning that G has no connected normal algebraic subgroups other than itself and the identity; 
this is the case for G = SL2 . 
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hold for §2 outside W' . Again by a bound on the number of points, followed by 
escape (Lem. I4.ip . there is & g2 ^ , k' = 0(1), lying outside W' . 

Thus, we have ffi, 92 £ A^ such that •i/' has a non-degenerate derivative at at least 
one point (namely, (zq, zq, zq))- This means that V' has non-degenerate derivative 
outside a proper subvariety Wq d Z x Z x Z (consisting of a bounded number of 
components of bounded degree). We finish exactly as in the proof of Prop. 14.21 using 
the same counting argument to conclude that 

i){{x G X : x ^ Wq{K)}\) > n Z{K)\^ - 0{\A n Z{K)\'^. 

and so \A n Z{K)\^ < \A^\^/'^ for some k = 0{1). Substituting this into ()4.8p . we 
obtain that 

for A non-empty, as desired. □ 

Remark. The proof of Prop. 14.31 lets itself be generalized fairly easily. In par- 
ticular, note that we used no properties of Z other than the fact that it was an 
irreducible one-dimensional variety of dimension 1. Thus, we have actually shown 
that 

\Ar\Z{K)\ < 

for any one-dimensional irreducible subvariety of a simple (or almost simple) group 
of Lie type G; the implied constant depends only on the degree of Z, the dimension 
of G and the degree of the group operation in G as a morphism. The main way in 
which the proof was easier than a full proof of 

dim(V) 

|An y| < |A''|di-(G) 

is that, since we were dealing with low-dimensional varieties, the inductive process 
was fairly simple, as were some of the counting arguments. However, the basic idea 
of the general inductive process is the same as here - go down in dimension, keeping 
the degree under control. It is not necessary for the maps used in the proof to be 
roughly injective (like the map (j) in (j4.3p ) , as long as the preimage of a generic point 
is a variety whose dimension is smaller than the dimension of the domain (as is the 
case for the map (j) in (|4.6p ). This means, in particular, that we need not try to make 
the Lie algebras 3, gig~^ be linearly independent - it is enough to ask that gig^^ 
not be equal to 3 (clearly a weaker condition when dim(3) > 1); we get g^g~^ / 3 
easily by the same argument as in the proof of Prop. HT3l using the simplicity of G. 

* * * 

We can now use the orbit-stabilizer theorem for sets to convert the upper bound 
given by Prop. 14.31 into a lower bound for 1^4 n T{K)\. 

Corollary 4.4. Let G = SL2, K afield. Let A C G{K) be a finite set with A = A~'^ , 
e ^ A. Let g be a regular semisimple element of A\ I > 1. Assume \A^\ < 
(5 > 0. Then 

\A^r\C[g)\ » \A\\~^^^\ 
where k and the implied constants depend only on I. 
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This corresponds to [LPlll Thm. 6.2]. The centrahzer C{g) is a maximal torus 
T. Clearly, 1/3 = dim(T)/ dim(G). 

Proof. By Prop. 14.31 and Lemma 13.51 (orbit-stabilizer), 

A 



\A'nC{g)\ » 



|^fc/|2/3- 

The inequality \A^^\ < follows from \A^\ < \A\^+^ and dO]). □ 

Already |Hel08l Prop. 4.1] proved Cor. 14.41 for some g £ A (i.e., there exists a 
torus T = C{g) with a large intersection with A^). The same proof as in |Hel08] 
shows that this is true for most g £ A, but it does not prove it for all g £ A. This 
was, in retrospect, an important technical weakness. 

The existence of a torus T = C{g) with a large intersection with played a 
crucial role in |Hel08j and [Helllj . but the fact that the version of Cor. 14.41 being 
proven there was weaker than the one given here made the rest of the argument 
more indirect and harder to generalize. 

4.3. High multiplicity and spectral gaps, I. In order to supplement our main 
argument, we will need to be able to show that, if A is very large {\A\ > IGI"*^^*^, 
5 > small), then {A U A''^ U {e})'' = G. (See the statement of 0) This task is 
not particularly hard; in |Hel08| . it was done "by hand", using a descent to a Borel 
subgroup and results on large subsets of Fp. As Nikolov and Pyber later pointed 
out, one can obtain a stronger result (with fc = 3) in a way that generalizes very 
easily. This requires a key concept - that of high eigenvalue multiplicity - which 
will appear again in ^4.51 



Proposition 4.5 (Frobenius). Let G = SL2(Fq), q = p", p odd. Then every non- 
trivial complex representation of G has dimension at least (q — l)/2. 

It is of course enough to show that every irreducible non-trivial complex repre- 
sentation has dimension at least {q — l)/2. 

Proof. By, e.g., the character tables in |FH91l §5.2]. See also the standard reference 
[LS74] or the exposition in jDSV03l §3.5] (for G = PSL2(Fp), p prime). □ 

There are analogues of Prop. 1^31 for all finite simple groups of Lie type. 

Now consider a Cayley graph T{G,A), where A generates G and A = A~^; we 
recall that this is defined to be the graph having G as its set of vertices and {{g, ag) : 
g £ G,a € A} as its set of edges. The (normalized) adjacency matrix : L ^ L is 
a linear operator on the space L of complex-valued functions on G: it is defined by 

(4.9) K/)(5) = ^E/(«5)- 

(Thus, the discrete Laplacian A we spoke of in the introduction equals I — ^.) Since 
A is symmetric, ^ has a full real spectrum 

(4.10) . . . < A2 < Ai < Ao = 1 
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with orthogonal eigenvectors vj; the eigenvector corresponding to the highest 
eigenvalue Aq is just a constant function. 

We can see from the definition (14. 9p that every eigenspace of is invariant under 
the action of G on the right; in other words, it is a representation - and it can be 
trivial only for the eigenspace consisting of constant functions, i.e., the eigenspace 
associated to Aq. Hence, by Prop. [45| every eigenvalue Aj, j > 0, has multiplicity 
at least {q — l)/2. 

The idea now is to use this high multiplicity to show a spectral gap, i.e., a non- 
trivial upper bound for Ai. Let us follow [Gow08| . which shows that this is not hard 
for A large. The trace ■s/'^ can be written in two ways: on one hand, it is 
times the number of length-2 paths whose head equals their tail, and, on the other, 
it equals a sum of squares of eigenvalues. In other words. 



\A\ 



yA?>^A 

/-^ J — 



for every j > 1, and so 



If A is large enough (close to |G| in size), this is much smaller than Aq = 1- This 
means that a few applications of "uniformize" any distribution very quickly, in 
that anything orthogonal to a constant function gets multiplied by Ai < 1 (or less) 
repeatedly. The proof of the following result is based on this idea. 

Proposition 4.6 ( |Gow08j and [NPll] ). Let G = SL2(Fq), q an odd prime power. 
LetAcG,A = A-K Assume \A\ > 2\G\^/^, 5 < 1/3. Then 

A^ = G. 

Neither |Gow08] and [NPllj require A = A~^; we are assuming it for simplicity. 

Proof. Suppose there is a g £ G such that g ^ A^. Then the inner product 
(^l^,lg^-i) equals 0. We can assume that eigenvectors Vj have ^2-norm 1 (rel- 
ative to the counting measure on G, say). Then 

2 



(4.11) =i^i-(:^) /ek^pj5:k.,,v- 





By |G| = (g2 _ q)q^ however, \A\ > 2\G\^/^ implies \A\'^/\G\ > y/2\G\\A\/{q - 1)\A\, 
and so (j4.1ip means that {-s/Ia, IgA-i) cannot be 0. □ 
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4.4. Growth in SL2(Fq). We finally come to the proof of Thm. 11.11 This is a 
"modern" proof, without any reliance on the sum-product theorem, and with a 
fairly straightforward generalization to higher-rank groups. This part is a little 
closer to [PSa ] (and, in a sense, [Helllj ) than to the first version of [BGTIDIEI Note 
the parallels with Prop. 13.71 (pivoting). 

Proof of Theorem \l.l[ By ()2.2p . we can assume that A = A^^ and e £ A without 
loss of generality. We can also assume A to be larger than an absolute constant, as 
otherwise |^ • ^| > |A| + 1 gives us 

|^3| > |^|i+<5 trivially. (If \A ■ A\ < \A\, then 
A ■ A D A ■ e = A implies A ■ A = A, and, since A generates G, A ■ A = A implies 
that A = G.) 

Assume |^^| < 1^1"'^''''^, where 6 > will be set later. By Lemma l4.ll (Escape), 
there is an element go G that is regular semisimple (that is, tr((7o) 7^ i2). Its 
centralizer T = Ccigo) is a maximal torus. 

Call ^ G G a pivot if the function (p^ : A x T ^ G defined by 

(4.12) {a,t)^a^tC^ 

is injective when considered as a function from ^/{ite} x T/{ibe} to G/{ibe}. (The 
analogy with the proof of Prop. [3^1 is deliberate.) 

Case (a): There is a pivot ^ in A. Since T = G{go), Cor. 14.41 (together with 
l^'^l < |^|i+'^W) gives us that there are > |A|i/3-0(5) elements of T in A^. Hence, 
by the injectivity of (p^, 

(l)^{A,A^nT) > ^1^11^2 nT| > = 

At the same time, (pxi^, nT) C A^, and so 

\A^\ »l|A|4/3-o(5)^ 
4' ' 

For 6 smaller than a positive constant, this gives a contradiction to |j4^| < 

by Ruzsa's inequality (j2.3p . (Recall that we can assume that \A\ is larger than an 

absolute constant.) 

Case (b): There are no pivots ^ in C. Then, for every (, £ G, there are ai, 02 G A, 
ti,t2 £ T, (ai,ti) / (±a2,±i2) such that ai^ti^~^ = ±e ■ a2Ct2C~^j i-e- 

02^0,1 = ±e ■ £,t2ti^C~^. 

This cannot happen if ai / ±02 and ti = ±t2, and so ti / ±t2- In other words, 

(4.13) A-^An^TC^ ^ {±e} 
for every ^ G G. 

Choose any g G ^T^~^ with g 7^ ibe. Then g is regular semisimple and its 
centralizer G{g) equals (,T^~^. (This is particular to SL2; see the comments after 
the proof.) Thus, by Cor. 14.41 we obtain that there are > c|^|^/^~'-^^''^ elements of 
^T^~^ in A^, where k and c > are absolute. This is valid for every conjugate 



"^^A simplification to the argument in [BGTll] found by tlie autlior at tlie time of the events 
(later incorporated in [BGTllj ) turned out to be equivalent to the argument of Pyber and Szabo. 
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^T^-^ of T with ^ G G. At least (l/2)|G|/|r| maximal tori of G are of the form 
CT^~\ i£G. Hence 

(4.14) \A^\ > lM(c|yl|i/3-OW _ 2) p'\A\'/^~0{S) _ 

(Since any element of G other than ite can lie on at most one maximal torus, there 
is no double counting.) 

From (|4.14p it follows immediately that either l^^l > \A\^^^ (use ([2T3]) ) or ^4 > 
\G\^-0{&)^ In the latter case, Prop. |M] implies that = G. 

Case (c): There are pivots and non-pivots in G. Since {A) = G, this implies that 
there is a ^ G G, not a pivot, and an a £ A such that is a pivot. Since ^ is not 
a pivot, we obtain (j4.13p . and so there are > |j4|^/^~'^('^) elements of ^T^~^ in A'', 
just as before. 

At the same time, aS, is a pivot, i.e., the map (j)^ in (I4.12p is injective. Hence 

^^{A,a{A'' n^Tr'^)a'^) > ^|.4||yl'= n ^Tr^l > ^|^|^~°^^^. 
Since ())^{A, a{A'' n ^T^-^)a~^) C A^+^, it follows that 

(4.15) > l|^|4/3-0(5) 

We set 6 small enough for Ruzsa's inequality (|2.3p to imply that (j4.15p contradicts 

One apparent obstacle to a generalization here is the fact that, in higher-rank 
groups (e.g. SL„, n > 3), the centralizer C{g) of an element g ^ ±e of a torus 
T is not necessarily equal to T; we have G{g) = T only when g is regular. This 
obstacle is not serious here, as the number of non-regular elements of A on a torus 
is small by a dimensional bound; this is already in [HellH §5.5]. The difficulty in 
generalizing Thm. fLD to higher-rank groups ( [Helll] . |GH11] ) resided, in retrospect, 
in the fact that the version of Cor.[0]in |Hel08[ §4] and |Helll[ Cor. 5.10] was slightly 
weaker, as discussed before. This made the pivoting argument more complicated and 
indirect, and thus harder to generalize; in particular, the sum-product theorem was 
still used, in spite of the attempts to gain independence from it in [Helllt §3]. 

As pointed out in [B GT11| . Thm. 11.11 actually implies the sum-product theo- 
rem; however, it is arguably more natural to deduce the sum-product theorem, or 
Prop. [2n from growth in the affine group (Prop. [377|) : multiplication and addition 
correspond to two different group actions. See ^2.3[ 

4.5. High multiplicity and spectral gaps, II. Now that we have proven the 
main theorem (Thm. II. ip . we may as well finish our account of growth in linear 
groups by going briefly over the proof of Thm. 11.21 (Bourgain-Gamburd), which 
gives us expanders. We will keep an eye on how the proof (from [BG08c]) can be 
adapted to general G. 

In §1.11 we said a pair {G, Aq) gives an e-expander if every S C G with |5| < \G\/2 
satisfies \SuAgS\ > (l+e)|5|. For G = SL2(Z/pZ) and = A modp {A C SL2(Z), 
(A) Zariski-dense) we will now prove that the second largest eigenvalue Ai of the 
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adjacency matrix si (see (I4.9|) . (I4.10p ) is at most 1 — e, where e > depends only 
on A. This imphes the first definition of "e-expander" : if U AgS\ < (1 + e)|S'|, 
then f:G^C defined by /(x) = I5 - \S\/\G\ would obey (^/, /)>(!- e)(/, /) 
and (/, 1g) = 0, a contradiction if Ai < 1 — e. 

(The two definitions are, in fact, equivalent for |j4| bounded; the other direction 
of implication is a little su btler - see |Alo86] . [AM85j . |Dod84j . [JS89] . [LS88] or 
the exposition in |LPW09t §13.3.2]. For \A\ arbitrary, the definition in terms of 
eigenvalues is equivalent to a slightly different combinatorial definition in terms of 
the bottleneck ratio [LPWOQ^ §7.2]. Expand ers are, at any rate, mostly of interest 
for 1^1 bounded.) 

Proof of Thm. Let S = A(J A~^ U {e}. Let G = SL2. Let jj, be the measure on 
Gp = SL2(Z/pZ) given by 

lO otherwise. 

We consider the convolutions fi^''^ = fi * fi * . . . * fx. We will see how |/i^'^^|2 
decreases as k increases. This happens very quickly at first {stage 1). It then goes 
on happening quickly enough {stage 2), thanks to Thm. [TTT] (applied via Prop. [23} 
the Bourgain-Gamburd "flattening lemma"). Once |/W^'^^|2 is quite small (not much 
larger than l/|Gp|, which is the least it could be), the proof can be finished off by an 
argument from j SX91) . based on the same high- multiplicity phenomenon that was 
exploited in §4.31 

Stage 0: Reduction to {A) free. For G = SL2, we can (as in [BG08c] ) define 
H = r(2) = {5 G G'(Z) : g = I mod 2}; now, H is both free and of finite index in 
G{'L); hence {A) DH is free (since, by the Nielsen- Schreier theorem, every subgroup 
of a free group is free), Zariski-dense, and generated by a set A' C (A) of bounded 
size (Schreier generators). We can thus replace A by A' (at the cost of at most a 
constant factor - depending on A and A' - in the final bounds), and assume from 
now on that {A) is free. 

(For general G, the task is much more delicate, since such a convenient H does 
not in general exist, and also because the "concentration in subgroups" issue we will 
discuss below requires stronger inputs to be addressed successfully - Zariski density 
no longer seems enough (given current methods). See |GV12] for a general solution. 
An approach via products of random matrices is also possible |BG08a] . |BG09] .) 

Stage 1. We can now assume that {A) is a free group on r > 2 elements. By 
the argument we went over in the introduction (shortly before the statement of 
Thm. II. 2p . there is a constant c depending only on A such that two words on A 
of length k < clogp reduce to the same element of G(Z/pZ) only if they give the 
same element of G(Z); since {A) is free, this can happen only if they have the same 
reduction (e.g., xx~^yz = xw'^wz). Thus, for instance, 

{k)f I words of length k reducing to the identity| 
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where fi^^^ = ^i* fi* ■ ■ ■ * fi {k times). Hence, Kesten's bound on the number of words 
of given length reducing to the identity |Kes59p^ gives us that, for any e > 0, 

and so, for k = [clogpj, 

/'He)«4' 

where > depends only on c, and thus only on A. 

It turns out that, using the fact that {A) is free, we can show not just that ^^^\e) 
is small, but that iJi^^\G') is small for any proper subgroup G' of G. For G = SL2, 
this is relatively straight-forward: every proper subgroup G' of Gp = SL2(Z/pZ) is 
almost solvable, i.e., contains a solvable subgroup G" of bounded index. It is enough 
to show that fi^'^''\G") is small (as this implies immediately that /u^'^^(G') is small, 
by pigeonhole). Because we are in SL2, G" is not just solvable but 2-step solvable, 
i.e., any elements gi, 92, 93, 94 £ G" must satisfy 

(4-16) [[51,52], [53,54]] = e. 

By the same idea as before, for k < clogp, c small enough, this is possible only 
if 51,52,53,54 are projections modp of elements of (A) that also satisfy (I4.16p . 
However, any set S of words of length < / in a free group such that all 4-tuples of 
elements of S satisfy (|4.16p must be of size < by a simple argument |BG08cl 

Prop. 8 and Lem. 3] based on the fact that the centralizer of a non-trivial element 
in a free group is cyclic: the centralizer is a free group (being a subgroup of a free 
group) but it cannot be of rank > 2, as it satisfies a non-trivial relation. Hence 
^(2fc)(G"), and thus fi^''\G'), is indeed small: 

/,W(G")«^(2'=)(G")«^^^«4' 

where r] > depends only on c, and thus only on A. 

(For general G, showing that there is no concentration in a proper subgroup G' is 
a much more delicate matter. A fully general solution was given by |GV12j ( "escape 
in mass from subvarieties").) 

Stage 2. We are in the case in which one of the main results in this survey 
(Thm. [LT]) win be applied (via Prop. 12. 5p . Consider /u^^'^), /u^*^'^), etc. At each 
step, we apply Prop. [231 (the flattening lemma) with K = , 6' > to he set later. 
If (12. 4p fails every time, we obtain [^'•^'^'^^12 < l/ICj after r = 05'^^(1) steps; we then 
go to stage 3. 

Suppose (12. 4p holds for some k' = 2^k, j <^s',ri 1- Then Prop. [23] gives that there 
is a p'^'^'^'-'-approximate subgroup H C G of size <C P^^^'^ /\fJ-^^'^\2 and an element 
9 eG such that fi^'''\Hg) > p^O(^'). In particular, < |F|i+0{57'?). Choosing 
6' > small enough, we get a contradiction to Thm. 11.11 unless \H\ > 



■The simple bound [BS87I Lem. 2] suffices for r > 3. 
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(where we can make 6 as small as we want) or H is contained in a proper subgroup 
G' of G. 

If \H\ > |G|i-OW, then \H\ < p^^'^'VIa*^'''^ ll implies that < 1/\G\^-^-^', 

and we go to stage 3. Assume, then, that H is contained in a proper subgroup 
G' of G. Then ^i^^'\G' g) > \G\-°^^'\ This implies that ^i^^\G' g) > IGj-^^"^') 
(simply because ^'^^ = ^^^'^ * /U^^ ^^^), i-e., ^^^'^ is concentrated in a subgroup, in 
contradiction (once we set 5' small enough) with what we proved in Stage 2. 

Stage 3. We have got to an £ = A;' < 2°^''^^'^^k <^,5 logp < log|G| such that 
< where 5 > is as small as we want. This is still weaker than 

a bound on l"^ mixing time (meaning an £ such that — 1/|G||2 ^ e/|G|), which 
is itself, in general, weaker than expansion. Let us see, however, how to get to 
expansion by using the high multiplicity of eigenvalues ( ^4.3p . (This is as in Sarnak- 
Xue [SX91j.) The trace of is, on the one hand, |G||/i^|2 (by definition of trace, 
since the probability of x returning to x after 2k steps of a random walk is l/U^H), 
and, on the other, ^ A?^ (sum of eigenvalues). Hence 

miAf <^Af = |G||/|i«^ = |G|^ 

where mi is the multiplicity of Ai. As we saw in §4.3( mi = {p — l)/2 ^ |G|^''^. 
Thus, Xf > \G\^-^/'^. We set 5 < 1/3 {6 = 1/6, say) and obtain that Ai < 1 - e, 
where e > depends only on A (and 6, which is now fixed). □ 

Applications have called for generalizing Thm [L2] in two directions. One is that 
of changing the Lie type. Here the first step was taken by Bourgain and Gamburd 
themselves [BGOSaj : a fully general statement for all perfect G is due to Varjii and 
Salehi-Golsefidy |GV12j . (We have already discussed one of the main issues involved 
in a generalization, namely, avoiding concentration in subgroups.) The other kind 
of generalization consists in changing the ground ring. For many applications, the 
most important change turns out not to be changing Fp for Fg, but changing Z/pZ 
for Z/dZ. (This is needed for the the affine sieve |BGS10| . one of the main ways 
in which results in the area get applied nowadays.) For SL2 and d square-free, this 
was done in [BGSIO]; [Varl2j and jG V12] solved the problem for G general and d 
square- free; [BG08b] addressed SL2 and d = p^ and jBGMj did the same for SL„ 
and d = p^. So far the only result for general moduli d is |BV12| . which treats SL„; 
the case of G general and d general is not yet finished. 

5. Growth in permutation groups 

5.1. Introduction. Our aim now will be to give some of the main ideas in the proof 
of quasipolynomial diameter for all Cayley graphs of the symmetric and alternating 
groups (Thm. [L3l) . 

The proof uses much of the foundational material we gave in ^ The structure of 
the proof is very different from that for linear algebraic groups, however. In partic- 
ular, we do not have access to dimensional bounds (since there is no clear meaning 
to dimension in a permutation group) or to escape-from-subvarieties arguments (es- 
sentially for the same reason). 
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Additional tools come from two sources. There were existing results on diameters 
of permutation groups: among those that were particularly useful, [BS92t Thm. 
1.1] reduces the problem to that for (A) = Alt(n) (intuitively the hardest case) 
and |BBS04| proves a polynomial diameter bound provided that A contains at least 
one element other than the identity with small support. Note that [SBS04J already 
uses the fact that even a very short random walk in Sym(n) takes an element of 

= {1, 2, . . . , n} to any other element with almost uniform distribution. 

Another key source of ideas came from existing work on Classification-free results 
on subgroups of Sym(n). The Classification of Finite Simple Groups is a result 
whose first proof spanned many volumes; its acceptance was gradual - even the 
date of its completion, at some point in the 80s, is unclear. Thus, there was an 
interest in what [Bab82j called "intelligible proofs of the asymptotic consequences of 
the Classification Theorem". Work in this direction includes |Bab82) . |Pyb93| and 

imi]. 

Precisely because this work had combinatorial, relatively elementary bases, it 
turned out to be very robust: that is, these are results on the size of subgroups 
that can be generalized to any subsets that grow slowly under multiplication. (The 
basic idea here is as in the orbit-stabilizer theorem and its consequences ( ^3.ip : 
these are bounds on the size of a subgroup H that are based on maps or processes 
that multiply H by itself a few times (< k times, say); thus, if instead of having a 
subgroup H we have a set A, we still have a result - just one where H gets replaced 
sometimes by A and sometimes by A^.) 

Just as a generalization of |LP11] played an important role in both |HW08j and 
[BGTll] . a generalization of |Bab82| a nd |Pyb 93 plays an important role in [HSJ. 
What jHS] uses is not the final resulCjin [Bab82], but rather an intermediate result, 
the "splitting lemma". This is a result based on what is called the probabilistic 
method in combinatorics (generally, as in [ASOOj , traced back to Erdos) . This method 
is based on the observation that, if we show that something happens with positive 
probability, then it happens sometimes; thus, if we impose a convenient distribution 
(often the uniform one) on some initial objects, and we obtain that they then satisfy 
a certain property with positive probability, we have shown that a configuration of 
objects satisfying the property exists. The objects in [Bab82 ' are elements of a group 
H. Now, we, in [HSj, do not have the right to choose elements of H at random; to 
do so would be to assume what we are trying to prove, namely, a small bound on 
the diameter. Instead (as in |BBS04] ) we mimic the effects of a uniform distribution 
by means of a random walk; since the set {1, 2, . . . , n} being acted upon is small, a 
short random walk is enough to give a distribution very close to uniform. 

The proof in |HS) contains many other elements; a full outline is given in [HSl §1.5]. 
Here, let us focus on a crucial part: the generalization of Babai's splitting lemma, 
and its application by means of the orbit-stabilizer theorem to create elements of 
a'' in small subgroups of Sym(n). We will then look at a diff'erent part of the proof. 



Namely, that a doubly transitive group G < Sn other than A„ or S„ satisfies \G\ < 
exp(exp(1.18\/log n)) for n greater than a constant [Bab82l Thm. 1.1]. Pyber |Pyb93| improved 
this to |Gi <nO(('°*5"'''. 
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giving a result of independent interest (Prop. I5.8p on small generating sets. This 
will demonstrate how different random processes - not just random walks on graphs 
- can be used to give explicit results on growth and generation. 

5.1.1. Notation. As we said earlier, we will follow here the sort of notation that is 
current in the literature on permutation groups: actions G ^ X are by default on 
the right, means g{x) (so that x^'^ = {x^)^), x^ is the orbit of x under A C G, 
and Gx means Stab(x). There are two different kind of stabilizers of a set S G X: 
the setwise stabilizer 

Gs = {g £ G : = S} 

and the pointwise stabilizer 

G^s) ={geG:x^ = x Vx G 5}. 
(The notation here is as in |DM96| and |Ser03] . not as in |Wie64] .) 

5.2. Random walks and elements of small support. We will start with some 
basic material on random walks. We will then be able to go briefly over the proof 
of |BBS04j . 

Let us start by defining our terms. We will work with a directed multigraph F, i.e., 
a graph where the edges are directed (i.e., they are arrows) and may have multiplicity 
> 1. (The setting we will work in now is more general than that of Cayley graphs.) 
We assume that T is strongly connected (i.e., there is a path respecting the arrows 
between any two points in the vertex set V{T)), regular of valency d (i.e., there are d 
arrows (counted with multiplicity) going out of every vertex in ViV)) and symmetric 
(i.e., the number of arrows from x to y is the same as the number of arrows from y 
to X, counting with multiplicity in both cases). 

We will study a lazy random walk: a particle moves from vertex to vertex, and 
at each point in time, if it is at a vertex x, and the arrows going out of x end at the 
vertices xi, X2,. . . , x^ (with repetitions possible), the particle decides to be lazy (i.e., 
stays at x) with probability 1/2, and moves to Xj with probability \/2d. (Studying 
a lazy random walk is a well-known trick used to avoid the possible effects of large 
negative eigenvalues of the adjacency matrix.) 

Let x,y £ V(r)- We write Pk{x,y) for the probability that a particle is at vertex 
y after k steps of a lazy random walk starting at x. For e > given, the £oo-mixing 
time for e is the least k such that 



- |y(F)|- 

(A mixing time is a time starting at which the outcome of a random walk is very 
close to a uniform distribution; the norm (e.g. ioo) and the tolerance (e) have to be 
specified (as they are here) for "very close" to have a precise meaning.) 

As before, the normalized adjacency matrix ^ is the operator taking a function 
/ : F(F) ^ C to the function ^/ : V{T) C defined by 

^/(x) = ^ Yl /(head(^;)). 

arrows v with tail(ii) = x 



Pkix,y) 



\V(T)\ 



38 



H. A. HELFGOTT 



(An arrow goes from its tail to its head.) Let fo, fi, f2, ■ ■ ■ be a full set of eigenvectors 
corresponding to the eigenvalues 1 = Aq > Ai > A2 > . . . of If F is regular and 
symmetric, then ^ is a symmetric operator, and so all Aj are real and all fi are 
orthogonal, and we can also assume all the fi to be real-valued. 
The following fact is well-known. 

Lemma 5.1. Let T be a connected, regular and symmetric multigraph of valency d 
and with N vertices. Then the loo mixing time for e is at most N'^ d\og{N / e) . 

The proof contains two steps: a trivial bottleneck bound gives a lower bound on 
the eigenvalue gap Aq — Ai = 1 — Ai, and a lower bound on the eigenvalue gap gives 
an upper bound on the mixing time. 

Proof. Let /i be an eigenvector corresponding to Ai; since /i is orthogonal to the 
constant function /o, the maximum r+ and the minimum r_ of fi{x) obey r-(_ > > 
r_. By pigeonhole, there is an r G (r_,r_|_] such that there are no x G ^(L) with 
/i(x) E {r — ri,r), where r/ = {r+ — r^)/N. Let S = {x £ V{T) : fi{x) > r}. Clearly, 
S is neither empty nor equal to all of V(T). 

Since F is connected, there is at least one x £ S with at least one arrow starting 
at X and ending outside S. (This is the same as saying that the bottleneck of a 
connected graph is > 1/Nd.) Hence 

(5.1) ^^/,(^)<_^ + ^/,(^). 

Again by {fo, fi) = 0, the average of /i(x) over 5 is > 0; trivially, it is also < r+. 
Thus, ()5.ip gives us 

Therefore, Ai < 1 - l/N\S\d < 1 - 1/N^d. 

This implies the desired bound on the mixing time (exercisj^ by an idea already 
used in §4.31 every step of the random walk multiplies the vector describing the 
probability distribution of the particle by {£/ + 1)/2, and so anything orthogonal to 
a constant function gets multiplied by Ai = 1 — 1/N'^d (or less) repeatedly. □ 

Lemma [5 . 1 1 may look weak, but it is actually quite useful for small, i.e., graphs 
with small vertex sets. When we work with a permutation group G < Sym(n), we 
may not have all the geometry we had at hand when working with linear algebraic 
groups, but we do have something else ~ an action on the small set {1, 2, . . . , n} (and 
tuples thereof); that action gives rise to graphs with small vertex sets, allowing us 
to use Lemma l5.1[ 

First, we prove that we can mimic the uniform distribution on A;-tuples by rela- 
tively short random walks. This is just as in [BBS041 §2]. 

■^"^See the proof of |HSI Lem. 4.1] (or any of many other sources, e.g., [Lov96l Thm. 5.1]) for a 
solution. It is easy to do this suboptimally and obtain an extra factor of \pk{x,y) — 1/N\ < N\2 
instead of \pkix,y) — 1/N\ < Aj. 
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Lemma 5.2. Let G be a k-transitive subgroup o/Sym(n). Let A be a set of gener- 
ators of G. Then there is a subset Aq C AU A~^ such that the following holds. 

For e > arbitrary, for any £ > 4,n?'^~^^ log(n'^/e) and for any k-tuples x, y of 
distinct elements of {1, 2, . . . , n}, the probability that the outcome g G (^o) of a lazy 
random walk of length i (on the graph T{{Ao),Aq), starting at e) take x to y is at 
least (1 — e)(n — ky./nl and at most (1 + e)(n — ky./nl. 

The number of A;-tuples of distinct elements of {1, 2, ... , n} is, of course, n!/(n — 
A;)!. 

Proof. Since A may be large, and it will be best to work with a generating set 
that is not very large, we start by choosing a subset of A that still generates G. 
This we do simply by choosing an element gi £ A, and then an element g2 ^ A 
such that (gi) < (51,52), and then a gs £ A such that (51, 52) < {91,92,93), etc., 
until we get {91,92, ■ ■ ■ ,9r) = {A) = G {r > 1). Since the longest subgroup chain 
in Sym(n) is of length < 2n - 3 |Bab86j we see that r < 2n — 2 < 2n. Let 

^0 = • • ■,9r,9r^}- 

Now define the multigraph F by letting the set of vertices consist of all fc-tuples 
of distinct elements of {1, 2, . . . , n}; draw an arrow between z = {zi,Z2, . . . , Zk) and 
(zf, Z2, ■ ■ . , z'^) for every vertex (i.e., /c-tuple) z and every a £ Aq. Finish by applying 
Lem. EH □ 

We will now see how to adapt the probabilistic method using Lem. [5^2] to approx- 
imate a uniform distribution by a short random walk. 

Proposition 5.3 ( |BBS04] ) . Let A C Sym(n) generate a 3-transitive subgroup of 
Sym(n). Let g G A^^\ ^0 ^ 1 arbitrary. Assume that < | supp(5)| < n. Then, for 
any e > 0, there is an element g' € (A U A'~^Y~^'^^'^ , £ <C n''' log(n/e), g ^ e, such 
that 

supp(5') < 3 + 3(1 + 0*(e))| supp(5)|V(n - 2), 
where the implied constant is absolute. 

The conclusion is non-trivial only when supp(5) < n/3. 

Proof. Given a £ Sym(n) and x £ {1,2, ...,n}, let h = a^^ga; thus, supp(/i) = 
(supp(5))'^. When is x in the support of the commutatoiP^ [g, h] = g~^h~^gh7 There 
are three possibilities: 

(a) X £ supp(5) and x^ ^ £ supp(/i), i.e., x^ ^ £ supp(5) H (supp(5))'^; 

(b) X £ supp(5), x^ supp(/i) and x £ supp(/i), and so, in particular, x £ 
supp(g) n (supp(5r))'^. 

(c) X ^ supp(5), X £ supp(/i) and x'^ £ supp(5), and so, in particular, x^ £ 
supp(5) n (supp(5r))''. 



The trivial bound is (log n!)/ log 2: in a subgroup chain Hi < H2 < ... < Hk, we have 
\H2\ > 2|i/i|, IHa] > 2\H2\, etc., simply because the index of a proper subgroup of a group is 
always < 2. 

^'^Defining the commutator in this way is standard in the study of permutation groups. 
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Thus, supp{[g, h]) is contained in 

(supp((7) n (supp(5))") U (supp((7) n (supp(5))")'' U (supp(5) n {snMaWf- 

Now let C7 G U A~^Y' be the outcome of a lazy random walk of length i' = 
\4:n'^^~^^log{n^/e)~\ , k = 3. Lemma [5^2] tells us that, for any x,x' G {1,2,... ,n}, a 
will take x to x' with probability between (1 — e)/n and {l + e)/n. Since expectation 
is additive, it follows that, for every S C {1, 2, . . . , n}, 

E{\SnS''\)=Y, Prob(x' e 5'") = ^ ^Prob(x 



x') 



x'es x'gSxgs 



a;'G5xG5 

Writing S = supp((7), we see that 



n n 

x'&Sx&S 



E(| supp(b, h])\)< E(|5 n {sr\) + ms n {srY\) + ms n {srT\) 

= 3 • E(|S n {SY\) < 3(1 + 0*(e))^ = 3(1 + 0*(e))J ^"^PP^^^'' 



n n 

Now we could conclude that there exists a a G {A[JA~^Y' such that | supp([(7, h])\ 
is at most 3(1 + 0*(e))| supp{g)\'^ /n. We forgot to take care of one detail, however: 
[g, a~^ga] could be the identity. Fortunately £' is large enough that Lemma 15.21 
assures us that, even if we specify that yf = y[ for some yi, E {1, 2, . . . , n} {i = 1, 2; 
yi 7^ 2/2, y'l 7^ ^2)1 probability that x"' = x' for x,x' G {1,2,... ,n} (x 7^ 
x' / y'. for i = 1, 2) is (1 + 0*(e))/(n - 2). (This is why we let A: = 3 and not k = 1.) 

We choose yi G supp(5), y[ G supp(5), 1/2 ^ supp(g), 7/2 = (y'l)^ • Then (by a brief 
computation) (i/i)^^''^' / and so [51, /i] is not the identity. 
The analysis goes on as before, except we obtain 

E{\{x e S n S'^l) < l + (l + 0*(e))^^ 

n — 2 

(or 2 + (1 + 0*(e))|S'p/(n — 2) for general S; we are using the fact that 2/2 and 7/2 
are never both in S in our case) and so 

E(| supp([5, /i])|) < 3 + 3(1 + 0*(6)) ''^PP^f 

n — 2 

Thus, there is a cr G (^ U A^^Y' such that g' = [g,h] = [g,a^^ga] has support 
< 3 + 3(1 + 0*(e))| supp(5)| - 2). □ 

Corollary 5.4 (t BBS04pl ). Let A C Sym(n) (A = A~^) generate a 3-transitive 
subgroup G o/Sym(n). Assume there is a g £ A, g ^ e, with \ supp(5f)| < (1/3 — e)n, 
e > 0. Then 

diam(r(G, A)) n"i(logn)'=2^ 
where ci = 8 and C2 is absolute. 



^'''in a side remark, [BBS04] claims ci — 7, but Babai has privately acknowledged a trivial 
oversight there. 
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Proof. Apply Prop. [531 then apply it again with g' instead of g, and again and again. 
After 0(log log n) steps, we will have obtained an element gi G A^'^ , £i <Ce n^(log ny, 

3 

such that gi e and | supp((7i)| < 3. A brief argument suffices to show that A"' acts 
3-transitively (i.e., for any two 3-tuples of distinct elements of {1,2 ... ,n}, there is 
an element of A taking one to the other) . Hence either all 2-cycles or all 3-cycles are 
in ^^1+2" (in that they can be obtained by conjugating gi by elements of A^ ). If at 
least one element h oi A is not in Alt(n), it is easy to construct a 2-cycle (and hence 
all 2-cycles) by using h and some well-chosen 3-cycles. We then construct every 
element of {A) (which, without having meant to, we are now showing to be either 
Alt(n) or Sym(n)) as a product of length at most n in our 2-cycles (if A (f_ Alt(n)) 
or 3-cycles (if A C Alt(n)). □ 

There is clearly some double-counting going on in the proof of Prop. 15.31 A more 
careful counting argument gives an improved statement that results in a version 
of Cor. 15.41 with 1/2 instead of 1/3. Redoing Prop. 15.31 with well-chosen words 
other than \g, h\ results in still better bounds; [BGH"*"] gives diam(r(G, ^4)) <C rtP^^^ 
provided that there is a g G A, g' 7^ e with | supp((7)| < 0.63n. 

The moral of this section is that short random walks can be enough for "the prob- 
abilistic method" in combinatorics (showing existence by showing positive probabil- 
ity) to work, in that they serve to approximate the uniform distribution on fc-tuples 
{k small) very well. 

5.3. Large orbits, pointwise stabilizers and stabilizer chains. The following 
result was a key part of the proof of Babai's elementary bound on the size of doubly 
transitive permutation groups other than Alt(n) and Sym(n) [Bab82] . It has a 
probabilistic proof. 

Lemma 5.5 (Babai's splitting lemma |Bab82j ). Let H < Sym(n) be doubly tran- 
sitive^ Let T, C {1,2,... ,n}. Assume that there are at least pn{n — 1) ordered 
pairs (a, /?) of distinct elements of {1,2,..., n} such that there is no g £ -f^(s) 
= (3. Then there is a subset S of H with 

H(T.s) = {e} 

and \S\ <^p log n. 

Proof. Let a, (3 be distinct elements of {1, 2, . . . , n}. Let h £ H. Suppose there is a 
g' G H(j]h^ such that a^' = /3. Then g = hg'h~^ is an element of -ff(s) taking a'^ 
to 

The elements h of H such that h"^ takes {a, /3) to a given pair (a', /3') of distinct 
elements of {1,2,... ,n} form a coset of H(^a,i3)- Hence, if we choose an element 
h £ H at random, h"^ is equally likely to take {a, (3) to any given pair (a',/?'). In 
particular, the probability that it will take (a, (3) to a pair (a', /?') such that there is 
no g £ -f^(s) taking a to (3 is at least p. By what we were saying, this would imply 
that there is no g' £ H^-^h-^ such that a^' = j3. 



Meaning that, for any two pairs (x,y), (x',y') of distinct elements of {1,2, . . . ,n}, there is a 
g & H such that {x,yY = {x',y'). 
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Now take a set 5 of r elements of H taken uniformly at random. For a given (a, 
the probability that, for every h G H, there is a g' G Hjj^h-^ such that = (3 is at 
most (1 — pY . There would be such a g' for every h G H ii there were a g' G H-^s 
such that a^' = j3 (since such a g' would be good for every h G H). Hence, the 
probability that there is at least one pair (a, (3) of distinct elements such that there 
is a G Hj^s with a^' = /? is at most n^(l— p)*". For any r > 2(logn)/log(l/(l— p)), 
we get n?{l — pY < 1, and thus there exists a set S of at most r elements such that 
there is no such pair {a, 13). If there is no such pair, then the only possible element 
of Hy,s is the identity, i.e., Hj^s = {e}. □ 

We wish to adapt Lem. 15.51 to hold for subsets A C G instead of subgroups H. 
Here one of our leitmotifs reappears, but undergoes a change. Adapting a result on 
subgroups to hold for subsets is a recurrent idea that we have seen throughout this 
survey. However, so far, we have usually done this by relaxing the condition that A 
be a subgroup into the condition that A - A - A not be much larger than A. This is a 
tactic that often works when the underlying idea is basically quantitative, as is the 
case, e.g., for the orbit stabilizer theorem. 

Another tactic consists in redoing an essentially constructive proof keeping track 
of how many products are taken; this is, for example, how a lemma of Bochert's gets 
adapted in |HSl Lem. 3.12]. To generalize Babai's splitting lemma, however, we will 
follow a third tactic - namely, making a probabilistic proof into what we can call a 
stochastic one, viz., one based on random walks, or, more generally, on a random 
process. 

We already saw how to use random walks in this way in §5.21 (Babai-Beals-Seress) ; 
the idea is to approximate the uniform distribution by a short random walk, using 
Lem. 121 

Lemma 5.6 (Splitting lemma for sets ([HS], Prop. 5.2)). Let A C Sym([n]) with 
A = A~^ , e G A and (A) 2-transitive. Let S C [n]. Assume that there are at 
least pn{n — 1) ordered pairs {a,j3) of distinct elements of [n] such that there is no 
g G (AL9"'i°g"J)(2) with a3 = 13. Then there is a subset S o/ ^ LS"" log nj ^^^/^ 

{AA-\^s) = {e} 

and \S\ <^p logn. 

Passing from the statement of Lem. 15.51 to the statement of Lem. 15.61 some in- 
stances of H get replaced by A and some get replaced by k moderate. This 
is what makes it possible to give a statement true for general sets A without as- 
sumptions on the size A ■ A ■ A; of course, a moment's thought shows that the 
statement is particularly strong when A grows slowly. This is much as in Prop. 14.21 
or Prop. 14. 3[ 

Sketch of proof. Exercise. Adapt the proof of Lem. 15.51 using Prop. 15.31 It is useful 
to note that gA(S9)5""^ = (9^fl'~^)(s)- □ 

Let us see how to use Lem. 15.61 (This part of the argument is similar to that in 
|Pyb93| , which sharpened the bounds in [Bab82) .) By pigeonhole, {AA~^)y,s = {e} 



GROWTH IN GROUPS: IDEAS AND PERSPECTIVES 



43 



can be the case only if |^| < n'^ and that can happen only if 

» 

(log nj^ 

This means that there is a constant c such that, for every S C {1, 2, . . . , n} with 
1 5] I < c(log |74|)/(logn)^, the assumption ("there are at least pn{n — 1) ordered 
pairs. . .") in Lem. [5^ does not hold (since the conclusion cannot hold.) 

Thus, for any cr < 1, we are guaranteed to be able to find E = {ai, 02, . . . , am}, 
m »^ (log |.4|)/(logn)2, such that, for A' = ylLSn^iognJ^ 

A' 

(5.3) I > an 

for every 1 < i < m; we are setting p = 1 — a. (The use of stabilizer chains 
A > ^(q^) > ^(ai,a2) > ••• goes back to the algorithmic work of Sims |Sim70j . 
|Sim71] . as does the use of the size of the orbits in (15. 3^ : see |Ser03t §4.1].) 

By (j5.3p . (^')™ occupies at least (an)^ cosets of the pointwise stabilizer Sym(n)(2) 
(exercise; [HSj Lem. 3.17]), out of n!/(n — m)! < n™ possible cosets of Sym(n)(2) in 
Sym(ra). The number of cosets of Sym(n)(2) in the setwise stabilizer Sym(n)2 is m\, 
which is much larger than n'"/(cjn)™ = (l/cr)'". (We can work with a = 9/10, say.) 
A hybrid of Lem. 13.31 and Lem. 13.41 ( (HSl Lem. 3.7]) then shows immediately that 
(^')^™'nSym(n)s intersects many (> a"^m\) cosets of Sym(n)(2) (and, in particular, 
KA'f"' n Sym(n)E| > a^m!). 

Let us see what we have got. We have constructed many elements of (A')^™ C 
A^'^''' lying in a "special subgroup" (Sym(n))s. This is analogous to the situation 
over linear algebraic groups, where we constructed many elements of lying in a 
special subgroup T = C{g) (Cor. l4.4|) . Moreover, the elements of (^')^™n(Sym(n))s 
will act (by conjugation) on an even more special^ subgroup, namely, (Sym(n))(2)- 

This is a turning point in the proof of Thm. 11.31 just as Cor. 14.41 (or its weaker 
version, |Hel081 Prop. 4.1]) had been a turning point in the proof of Thm. [TTl 

5.4. Constructing small generating sets. Let A be a set of generators of Sym(n) 
or Alt(n). The set A may be large - inconveniently so for some purposes. Can we 
find a set 5 C A^ of bounded size (£ moderate) so that S generates {A) ? 

This is a question that arises in the course of the proof of Thm. II. 3[ Addressing 
it will give us the opportunity to show how to use stochastic processes other than a 
simple random walk in order to put a generalized probabilistic method into practice. 

Let us start with a simple lemma. 

Lemma 5.7. |HS[ Lem. 4.3] Let A C Sym([n]), e £ A. Assume (A) is transitive. 
Then there is a g € A^ such that \ supp(g)| > n/2. 

^^Note the shift, or non-shift, in the meaning of "special" (dictated by the requirements of the 
problems at hand). Before, a special subgroup was exactly that, namely, an algebraic subgroup 
("special" meaning "lying on a set of positive codimension, algebraically speaking"). Here the 
role of "special subgroups" is played by stabilizers (in relation to the natural action of Sym([n]) on 
{1,2,..., n}, or powers thereof). The difference is, however, less than it seems at first: the algebraic 
subgroup T is also given as a stabilizer C{g) (in relation to the action by conjugation of G on itself, 
which is a natural action of G on an affine space.) 
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Sketch of proof. For every i £ [n], there is a, gi G A such that i E supp(g) (why?). 
Consider g = g'-^ g'2 ■ ■ ■ qIT ■> where ri,r2,...r„ S {0,1} are independent random 
variables taking the values and 1 with equal probability. Show that the expected 
value of I supp((7)| is at least n/2 (exercise). □ 

We will be using g to move an element of {1, 2, . . . , n} around and another element 
h (produced by a random walk) in order to scramble {1, 2, . . . , n}. 

Proposition 5.8. [HS| Lem. 4.5 and Prop. 4.6] Let A C Sym(n) with A = A^^ , 
e £ A and {A) = Sym(n) or Alt(n). Then there are g E ^4" and j,h € 
such that {g, j, h) is transitive. 

Extended sketch of proof . By Lem. \57l\ there is a g £ A"" with | supp((7)| > n/2. Let 
h £ A^, i = n'^^, k = [81ogn] (say) be the outcome of a lazy random walk on 
T{G,A) of length i (starting at e). We can assume n is larger than an absolute 
constant. 

We will consider words of the form 

f{a) = hg'^^hg-K..hg-'^, 

where ai,...,ak E {0,1}. We wish to show that, for (3 E {l,2,...,n} taken at 
random (with the uniform distribution on {1, 2, . . . , n}), the orbit /3-^("), a E {0, l}'^, 
is likely to be very large (^ n/(logn)^). 

A simple sphere-packing bound shows that there is a set V C {0,1}^, \V\ > n, 
such that the Hamming distanc^f^ between any two distinct elements of V is at 
least k/5 > log2 ?^. (Exercise.) We wish to show that, for /? random and a,a' 
distinct, it is very unlikely that equal 

Write a = (ai, 02, . . . , a^), a' = {a[,a'2, ■ ■ ■ , a^). Consider the sequences 

/3o = /3, /3i = /^o^^"' , /32 = /3f , . . . , A = /^K' , 

= /?, /3l = P'-'^' , = ,...,P'k = WLi)''^'' ■ 

It is very unlikely that = (3 (probability ~ 1/n) or that P'^s = i.e., = pg-'' 
(probability ~ 1/n). If neither of these unlikely occurrences takes place, it is also 
very unlikely (total probability ^ 4/n) that Pi or equal /3 or /3^. The reason is 
that, since /3i has not been seen "before" (i.e., (/3,/3i) is a pair of distinct elements), 
the distribution of /i^^ is almost uniform, even conditionally on /3 = x, for any 
X. (This can be easily made rigorous; it is much as (for instance) in the proof of 
Prop. (531 right after (|5.2p .) Thus, the probability that /S^ = P (for example) is ~ 1/n 
(the same as the probability of /3j^ = x for any x other than p'^). Proceeding in this 
way, we obtain that it is almost certain (probability 1 — 0{k/n)) that /3i,/32, . . . , (3^ 
are all distinct. (Recall that, by Lemma 15.21 a random walk of length i mixes k- 
tuples (and even 2A;-tuples) of distinct elements. It is also relevant that k is very 
small compared to n, as this means that hitting one of k (or rather 2k) visited 

■^'^The Hamming distance d{x,y) between two elements x,y £ {0, 1}* is the number of indices 
1 < i < k such that Xi ^ Hi- 
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elements by picking an element of [n] at random is highly unlikely. We can keep our 
independence from the past as long as we do not go back to it.) 

Let us see what happens to /3q, /3j, . . . in the meantime. Start at i = and increase 

i by 1 repeatedly. As long as ai = a'^, we have Pi = I3[. As soon as Oj ^ a[ (denote by 

ii the first index i for which this happens), we may have (3i ^ j3[; this happens when 

£ supp(5(), i.e., it happens with probability ~ | supp(5()|/n. If this happens, 
then, by the same argument as above, it is highly likely that the two paths in (15. 4p 
diverge, i.e., /3j ^ /3j for all j > i, and, for that matter, that they also avoid each 
other's past {/3j ^ [i[ for all j > i and all I < j). (It is useful to keep track of the 
latter condition for the same reason as above, namely, to keep our independence 
from events that have already been determined.) 

Since a, a' are at Hamming distance at least n from each other, it is very unlikely 
that f3^_i G supp(5) for all i such that Oj / a- (probability < (1 — | supp((7)|/n)" < 
2~", since there are n such indices i). Hence the two paths almost certainly diverge 
- never to meet again, as we just showed; in particular, /S'^^") and /3'^^"' ^ are almost 
certainly distinct. They are distinct with probability > 1 — 0((logn)^/n) for any 
distinct a,a' £ V and (3 G [n] random, to be precise. 

By Cauchy-Schwarz, this implies that the expected value of l/\l3^^'^'^ \ for /? random 
is 0((log n)^/n). This implies, in turn, that the expected value of the number of 
orbits of {g,h) is 0((logn)^). (Exercises.) 

A third element j € {A) obtained by a random walk of length £ almost certainly 
merges these orbits, i.e., {g,h,j) is transitive. (Longer but easy exercise.) Hence 
there exist (many) g, h,j G A''' such that {g, h,j) is transitive. □ 

Corollary 5.9. jHSi Cor. 4.7] Let A C Sym(n) with A = A''^ , e £ A and {A) = 
Sym(n) or Alt(n). Then, for every k >1, there is a set S E ^(3n)*nO('°s"') ^^^g 
most 3k such that (S) is transitive. 

In particular, if we want {S) to be Sym(n) or Alt(n) (something we do not need in 
the application in |HS] ) then it is enough to set A; = 6, as the Classification of Finite 
Simple groups implies that a 6-transitive group must be either Sym(n) or Alt(n). 

Sketch of proof . Apply Prop. 15.81 repeatedly, using Schreier generators to pass to 
pointwise stabilizers of {1}, {1,2}, etc. □ 

How far can arguments such as those in the proof of Prop. 15.81 be pushed? Here 
there is again a "classical" argument to be examined in the light of random processes 
and random walks, namely, the work of Broder and Shamir on the spectral gap of 
random graphs |BS87] . The ideas there and those in Prop. 15.81 are some of the 
elements leading to [HSZ] . which gives a bound of O (n^(log n)'^'-^^) on the diameter 
of T{{A),A) for A = {g, h}, g,h £ Sym(n) random. 

5.5. The action of the setwise stabilizer on the pointw^ise stabilizer. How is 

this all put together to give Thm. II. 3P The entire argument is outlined in detail in 
[HSl §1.5]. Here, let us go over a crucial step and look quickly at what then follows, 
skipping some of the complications. 
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We are working with a set A C Sym(n) generating Alt(n) or Sym(n). By the 
end of §5.3( we had constructed a large subset S C {1,2,. ..,n} (m = |S| » 
(log|A|)/(logn)2) such that (A")s (where A" = (yl')^™ = intersects > 

a^m\ cosets of Sym(n)(2); in other words, the projection of {A")y, to Sym(S) (by 
restriction to S) has > a^ml elements. 

By the trick of demanding ()5.3p for m + 1 rather than m, we can ensure that 
((^"){s)) has at least one large orbit T {\T\ > an). We can actually assume that 
((^"){s)) acts as Sym or Alt on F, since otherwise we are done by a different argu- 
ment (called descent in |HS| §6], as in "infinite descent", because it is inductive; it 
is also the one step that involves the Classification of Finite Simple Groups). Then, 

by Cor. 15. 9| there is a set C ^ (^")"°^'°'^"M ^ l^"] < g, such that (5) acts as a 

\ / (s) 

2-transitive group on F. 

We now consider the action of the elements of (^")e on the elements of S by 
conjugation. By the orbit-stabilizer principle (Lem. 13. ip . either (a) there is an 
element g e oi {A")^ commuting with every element of S, or (b) the orbit {gsg"^ : 
g S {A")-£} of some s G is of size > |(^")s|"^''^ ^ (cT™m!)^/^. This orbit is entirely 
contained in the pointwise stabilizer 

In case (a), g must act trivially on F (since it commutes with a 2-transitive group 
on F) and so we are done by Cor. 15.41 (Babai-Beals-Seress). In case (b), we have 
succeeded in constructing many elements of in the pointwise stabilizer of 

the set S. 

This does not mean we are done yet; perhaps there were already many elements 
of A"^ in the pointwise stabilizer of S. (Otherwise Lem. 13.31 does mean that A 
is growing rapidly, and so we are done.) However, having many elements in the 
pointwise stabilizer of S does mean that we can start now an iteration, constructing 
a second set S2 and a longer stabilizer chain satisfying (j5.3p with A replaced by 
/ ^nO('°g") \ ^ gj^^ then a third set S3, and so on and so on. Instead of focusing on 

making A grow, we focus on making the length of the stabilizer chain grow, until it 
reaches size about n, at which point we are done. 

6. Some open problems 
The following questions are hard and far from new. 

(a) Consider all Cayley graphs F(G,^) with G = SL2(Fp), A C G, \A\ = 2, 
{A) = G, p arbitrary. Are they all e-expanders for some fixed e > 0? 

(As |LW93t p. 96] says, an affirmative answer was made plausible by the 
experiments in |LR92] . The proof in |BG08c| is valid for A = Aq modj? {Aq 
fixed) and also for A random (with probability 1), among other cases; see also 
|BG10] . Expansion for A = Aq modp, Aq fixed, is known for all non-abelian 
simple groups of Lie type G (of bounded rank) thanks to |GV12] : a proof 
of expansion has also been announced for such G and A random [BGGTj . 
Expansion has been conjectured for general G of bounded rank and arbitrary 
A; see, e.g., |Lubl2[ Conj. 2.29].) 
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(b) Does every Cayley graph T{G,A) with G = SL„(Fp), A a set of generators 
of G (n and p arbitrary) have diameter (loglGI)*^, where C is an absolute 
constant? 

(This is Babai's conjecture in the case of hnear algebraic groups. The cases 
n = 2, n = 3 were proven in [HelOS], [Helllj . Both [BGTll j and [PSa] give 
this result with G depending on n.) 

(c) Does every Cayley graph r(Sym(n), ^4) (A a set of generators of Sym(n)) 
have diameter n*^^^)? 

(This predates Babai's more general conjecture. Here Thm. 11.31 (as in 
|HS] ) is the best result to date.) 

(d) Let A consist of two random elements of Alt(n). Is r(Alt(n),^) an e- 
expander, e > fixed? Is the diameter of r(Alt(n), j4) at most n(logn)'^'^^''? 

(A "yes" to the former question implies a "yes" to the latter, but there is 
no consensus on what the answer to either question should be.) 

(e) ("Navigation") Given a set A of generators of G = SL2(Fp) (l^l = 2 if you 
wish) and an element g of G, can you find in time 0{{logpY^) a product of 
length 0((logp)^2) of elements of yl U A~^ equal to g7 

(A probabilistic algorithm for a specific A is given in [Lar03j .) 

One of the difficulties in answering question ((cj) resides in the fact that a statement 
such as Thm. 11.11 cannot be true for all subsets ^ of a symmetric group G^ What 
happens if the conditions on A are strengthened? In a first draft of the present text, 
the author asked whether r-transitivity is enough. That is: let ^ C G (G = Sym(n) 
or G = Alt(n)) be a set of generators of G; assume that A is r-transitive, meaning 
that, for any two r-tuples vi, V2 of distinct elements of {1, 2, . . . , n}, there is a. g G A 
such that g takes vi to V2- If r is greater than a constant (say 6), does it follow that 

(6.1) either \A^\ > \A\^+^ or A'' = G, 

where k is an absolute constant? L. Pyber promptly showed that the answer is "no": 
let A be the union of any large subgroup H < G and the union of all 2r-cycles; then 
< |Gp|yl| < n'*''|^|, and this is much smaller than |^|-'^+'^ for H large. 

What if ^ C G is of the form A = , where \B\ = 0{\)1 Is this a sufficient 
condition for (16. Ih to hold? It is easy to see that a "yes" answer here, together with 
a stronger version of Prop. 15.81 (with j, h G instead of j, h S would 

imply a "yes" answer to (jcj) above. 

A separate, more open-ended question is that of the relevance of [HSj to the study 
of linear algebraic groups. As we discussed before, the problem of proving growth 
in Altn is closely related to that of proving growth in SL„ uniformly as n — t- oo. 

Challenge. Apply the ideas in |HS] to question ([6]) above. 



Both Pyber and Spiga have given counterexamples. The foUowing counterexample is due to 
Pyber: let G = Sym(2n + 1) and A = H VJ {a, f^^}, where a is the shift m — >■ m + 2 mod 2n + 1 
and H is the subgroup generated by all transpositions + 1) with 1 < i < n. Then |Aj4A| <C |^|. 
See also [PPSS12I §3] and [Spil2] . 
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Finally, let us end with a question for which the time is arguably ripe, but for 
which there is still no full answer. The idea is to give a full description of subsets of 
A that fail to grow. 

Conjecture 1. Let K be a field. Let A he a finite subset of GLn{K) with A = A~^ , 
e €z A. Then, for every R> 1, either 

(a) \A^\ > R\A\, or else 

(b) there are two subgroups Hi < H2 in Ghn{K) and an integer k = 0„(1) such 
that 

• Hi and H2 are both normal in {A), and H2/H1 is nilpotent, 

• A^ contains Hi, and 

• \A''nH2\ > i?-^"(i)|yl|. 

This conjecture was made fairly explicitly in [Helll| (see comments after [HellH 
Thm 1.1]), where it was also proven for n = 3 and K = ¥p (in a slightly weaker form). 
The same conjecture was proven for n general and K = ¥p as [GHi. Thm. 2] (joint 
with Pyber and Szabo). Breuillard, Green and Tao have given to this conjecture the 
name of Helfgott-Lindenstrauss conjecture^ in [ BGT12j . they proved a qualitative 
version with non-explicit bounds (valid even for non-algebraic groups). The case 
of n general and K general, as stated here, remains open. A somewhat weaker 
version (for n and K general, but with H2/H1 soluble rather than nilpotent) has 
been proven by Pyber and Szabo |PSbl Thm. 8]. 
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